Tutorial 4 shows one way to plan the design of a future experiment using EHA.

1. Load the libraries that we will be using

pkg <- c("cmdstanr", "standist", "tidyverse", "RColorBrewer", "patchwork", 
         "brms", "tidybayes", "bayesplot", "future", "parallel", "VGAM", "faux",
         "rstan", "osfr", "here")

lapply(pkg, library, character.only = TRUE)
## This is cmdstanr version 0.9.0.9000
## - CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
## - CmdStan path: /Users/spanis/.cmdstan/cmdstan-2.36.0
## - CmdStan version: 2.36.0
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Loading required package: Rcpp
## 
## Loading 'brms' package (version 2.22.0). Useful instructions
## can be found by typing help('brms'). A more detailed introduction
## to the package is available through vignette('brms_overview').
## 
## 
## Attaching package: 'brms'
## 
## 
## The following object is masked from 'package:stats':
## 
##     ar
## 
## 
## 
## Attaching package: 'tidybayes'
## 
## 
## The following objects are masked from 'package:brms':
## 
##     dstudent_t, pstudent_t, qstudent_t, rstudent_t
## 
## 
## This is bayesplot version 1.13.0
## 
## - Online documentation and vignettes at mc-stan.org/bayesplot
## 
## - bayesplot theme set to bayesplot::theme_default()
## 
##    * Does _not_ affect other ggplot2 plots
## 
##    * See ?bayesplot_theme_set for details on theme setting
## 
## 
## Attaching package: 'bayesplot'
## 
## 
## The following object is masked from 'package:brms':
## 
##     rhat
## 
## 
## Loading required package: stats4
## 
## Loading required package: splines
## 
## 
## Attaching package: 'VGAM'
## 
## 
## The following objects are masked from 'package:brms':
## 
##     acat, cratio, cumulative, dfrechet, dirichlet, exponential,
##     frechet, geometric, lognormal, multinomial, negbinomial, pfrechet,
##     qfrechet, rfrechet, s, sratio
## 
## 
## The following objects are masked from 'package:standist':
## 
##     erf, erfc
## 
## 
## 
## ************
## Welcome to faux. For support and examples visit:
## https://debruine.github.io/faux/
## - Get and set global package options with: faux_options()
## ************
## 
## Loading required package: StanHeaders
## 
## 
## rstan version 2.32.7 (Stan version 2.32.2)
## 
## 
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
## For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,
## change `threads_per_chain` option:
## rstan_options(threads_per_chain = 1)
## 
## 
## 
## Attaching package: 'rstan'
## 
## 
## The following object is masked from 'package:tidyr':
## 
##     extract
## 
## 
## here() starts at /Users/spanis/Documents/R_Projects/Tutorial_EHA

Set options.

options(brms.backend = "cmdstanr",
        mc.cores = parallel::detectCores(),
        future.fork.enable = TRUE,
        future.rng.onMisuse = "ignore") ## automatically set in RStudio

rstan_options(auto_write = TRUE)

supportsMulticore()
detectCores()
# packageVersion("cmdstanr")
# devtools::session_info("rstan")

theme settings for ggplot

theme_set(
  theme_bw() +
    theme(text = element_text(size = 22, face = "bold"), 
          title = element_text(size = 22, face = "bold"),
          legend.position = "bottom")
)

## Set the amount of dodge in figures
pd <- position_dodge(0.7)
pd2 <- position_dodge(1)

2. A brief outline of the workflow

2.1 - Main background sources

The basic approach is based upon two main sources:

The basic structure and code follows the examples outlined by Solomon Kurz in his ‘power’ blog posts and Lisa Debruine’s faux{} package.

For Solomon’s blog posts, see here: https://solomonkurz.netlify.app/tags/power/

For Lisa’s faux package, see here: https://debruine.github.io/faux/. The vignettes provided by the faux package are particularly helpful and worth following before you take a deep dive into the below code. There is also an associated journal article, which is worth reading.

And for some simpler worked examples of the general workflow (i.e., simpler than using EHA), then take a look at these examples by Rich, which show how you might simulate multi-level data for a range of factorial designs: https://github.com/rich-ramsey/sim_demo

2.2 - Basic workflow

The basic workflow is as follows:

  1. Fit a regression model to existing data.
  2. Use the regression model parameters to simulate new data.
  3. Write a function to create 1000s of datasets and vary parameters of interest (e.g., sample size, trial count, effect size).
  4. Summarise the simulated data to estimate likely power or precision of the various research designs.

2.3 - Exceptions to the normal workflow

The basic workflow above is a modified version of that recommended by Solomon and Lisa. In a typical workflow, once the datasets are simulated, a regression model would be fit to each dataset. Then, the parameters from each model would be summarised to make a judgment about the likely precision or power of the design. However, in the case of some of the models in this tutorial, which can take several hours to build, we are not able to easily build 1000 models per variation in parameter setting, such as sample size and/or trial count, as that would take an unreasonably long time on a desktop machine. Instead, we choose to take a simpler approach and summarise the data per bin and condition, rather than the parameter estimates from the model. Although this is not as satisfying, we feel it can still be a useful guide to set expectations. To further save time and computational resources, for the purposes of this planning tutorial, we also choose to use a smaller dataset than was used in the earlier tutorials. The details on this simpler dataset and modelling procedure are provided below.

3. Read in the data

Here, we again use data form Panis and Schmidt (2016), but to make things simpler, we focus on only 6 time bins (time bins 4-9) and two prime conditions (congruent and incongruent).

read in and wrangle the raw data

data <- read_csv("Tutorial_4_planning/data/tidyppp_Exp1_n6.csv") %>%
  ## no mask condition only, neutral prime removed, time bins 4-9 only
  filter(no_ma == 1, no_pr < 1, timebin %in% c(4:9)) %>% 
  ## remove unnecessary columns
  select(-c(bl, target_dir, corresp, resp, respac, rel_ma, irr_ma, ran_ma, no_ma)) %>%
  ## rename the error column
  rename(error = resperr) %>% 
  ## create a new variable and some factors
  mutate(prime = if_else(con_pr == 1, "con", "inc"),
         prime = factor(prime,
                        levels = c("con", "inc")),
         timebin = factor(timebin)) %>% 
  ## simplify the data by selecting a few variables only
  select(pid, timebin, prime, outcome)
## Rows: 93660 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (21): pid, bl, target_dir, corresp, resp, respac, resperr, rt, trial_n, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(data)
## # A tibble: 6 × 4
##     pid timebin prime outcome
##   <dbl> <fct>   <fct>   <dbl>
## 1     1 4       con         0
## 2     1 5       con         0
## 3     1 6       con         0
## 4     1 7       con         0
## 5     1 8       con         0
## 6     1 9       con         0
str(data)
## tibble [7,385 × 4] (S3: tbl_df/tbl/data.frame)
##  $ pid    : num [1:7385] 1 1 1 1 1 1 1 1 1 1 ...
##  $ timebin: Factor w/ 6 levels "4","5","6","7",..: 1 2 3 4 5 6 1 2 3 4 ...
##  $ prime  : Factor w/ 2 levels "con","inc": 1 1 1 1 1 1 2 2 2 2 ...
##  $ outcome: num [1:7385] 0 0 0 0 0 0 0 0 0 0 ...

data check

data %>% 
  distinct(timebin, prime)
## # A tibble: 12 × 2
##    timebin prime
##    <fct>   <fct>
##  1 4       con  
##  2 5       con  
##  3 6       con  
##  4 7       con  
##  5 8       con  
##  6 9       con  
##  7 4       inc  
##  8 5       inc  
##  9 6       inc  
## 10 7       inc  
## 11 8       inc  
## 12 9       inc

create summary data

summary data

data_pid <- data %>% 
  group_by(pid, timebin, prime) %>% 
  summarise(outcome = mean(outcome, na.rm = TRUE))
## `summarise()` has grouped output by 'pid', 'timebin'. You can override using
## the `.groups` argument.
head(data_pid)
## # A tibble: 6 × 4
## # Groups:   pid, timebin [3]
##     pid timebin prime outcome
##   <dbl> <fct>   <fct>   <dbl>
## 1     1 4       con    0     
## 2     1 4       inc    0     
## 3     1 5       con    0     
## 4     1 5       inc    0     
## 5     1 6       con    0.0667
## 6     1 6       inc    0.075
data_group <- data %>% 
  group_by(timebin, prime) %>% 
  summarise(outcome = mean(outcome, na.rm = TRUE))
## `summarise()` has grouped output by 'timebin'. You can override using the
## `.groups` argument.
head(data_group)
## # A tibble: 6 × 3
## # Groups:   timebin [3]
##   timebin prime outcome
##   <fct>   <fct>   <dbl>
## 1 4       con   0.00975
## 2 4       inc   0.0139 
## 3 5       con   0.0478 
## 4 5       inc   0.0423 
## 5 6       con   0.0901 
## 6 6       inc   0.0913

quick plot

p3.1 <- ggplot(data_group, aes(x=timebin, y = outcome, colour = prime)) +
   geom_line(aes(group = prime, linetype = prime)) + geom_point() +
   geom_jitter(data=data_pid, alpha = 0.5, width = 0.1, height = 0) +
   scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) +
   scale_x_discrete(labels = c(1:6)) +
   scale_fill_brewer(palette = "Dark2") +
   scale_colour_brewer(palette = "Dark2") +
   labs(title = "Panis & Schmidt (2016) (N=6)",
        x = "time bins")
p3.1

check the number of trials

tally <- data %>% 
  group_by(pid, prime, timebin) %>%
  summarise(n=n(),
            sum=sum(outcome),
            .groups = "drop")
tally
## # A tibble: 72 × 5
##      pid prime timebin     n   sum
##    <dbl> <fct> <fct>   <int> <dbl>
##  1     1 con   4         120     0
##  2     1 con   5         120     0
##  3     1 con   6         120     8
##  4     1 con   7         112    19
##  5     1 con   8          93    26
##  6     1 con   9          67    28
##  7     1 inc   4         120     0
##  8     1 inc   5         120     0
##  9     1 inc   6         120     9
## 10     1 inc   7         111    12
## # ℹ 62 more rows

4. Build an initial model

In general, it can sometimes be a lot simpler to use an index coding approach to build regression models.

see this link for some background on an index coding approach: https://bookdown.org/content/4857/the-many-variables-the-spurious-waffles.html#many-categories.

and then maybe specify the interaction term like this… https://discourse.mc-stan.org/t/specifying-formulates-for-interaction-between-categorical-variables-with-the-index-coding-approach-in-brms/29449/3

And in the context of data simulation, index coding can be particularly useful, because it may be often easier and more intuitive to set condition means, rather than more complicated regression parameters, such as interaction terms.

The approach here is to build an initial model using an index coding approach, but with a subset of the data that was used in prior tutorials.

Because we are building Bayesian regression models, which generate a posterior distribution per condition of interest in our index coding approach, we can simply calculate contrasts of interests and associated precision or interval estimates (95% quantile intervals, for example) to address any questions we may have about comparisons between conditions, such as incongruent vs congruent in a particular time bin.

formula

with varying intercept and slope for prime by pid

formula = bf(outcome ~ 0 + timebin:prime +
               (0 + timebin:prime | pid))

check the priors available

get_prior(formula,
          data = data, family = bernoulli(link = "cloglog"))
##                 prior class              coef group resp dpar nlpar lb ub
##                (flat)     b                                              
##                (flat)     b timebin4:primecon                            
##                (flat)     b timebin4:primeinc                            
##                (flat)     b timebin5:primecon                            
##                (flat)     b timebin5:primeinc                            
##                (flat)     b timebin6:primecon                            
##                (flat)     b timebin6:primeinc                            
##                (flat)     b timebin7:primecon                            
##                (flat)     b timebin7:primeinc                            
##                (flat)     b timebin8:primecon                            
##                (flat)     b timebin8:primeinc                            
##                (flat)     b timebin9:primecon                            
##                (flat)     b timebin9:primeinc                            
##                lkj(1)   cor                                              
##                lkj(1)   cor                     pid                      
##  student_t(3, 0, 2.5)    sd                                          0   
##  student_t(3, 0, 2.5)    sd                     pid                  0   
##  student_t(3, 0, 2.5)    sd timebin4:primecon   pid                  0   
##  student_t(3, 0, 2.5)    sd timebin4:primeinc   pid                  0   
##  student_t(3, 0, 2.5)    sd timebin5:primecon   pid                  0   
##  student_t(3, 0, 2.5)    sd timebin5:primeinc   pid                  0   
##  student_t(3, 0, 2.5)    sd timebin6:primecon   pid                  0   
##  student_t(3, 0, 2.5)    sd timebin6:primeinc   pid                  0   
##  student_t(3, 0, 2.5)    sd timebin7:primecon   pid                  0   
##  student_t(3, 0, 2.5)    sd timebin7:primeinc   pid                  0   
##  student_t(3, 0, 2.5)    sd timebin8:primecon   pid                  0   
##  student_t(3, 0, 2.5)    sd timebin8:primeinc   pid                  0   
##  student_t(3, 0, 2.5)    sd timebin9:primecon   pid                  0   
##  student_t(3, 0, 2.5)    sd timebin9:primeinc   pid                  0   
##        source
##       default
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##       default
##  (vectorized)
##       default
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)
##  (vectorized)

visualise priors

Using an index coding approach, each condition has a prior for the mean of that condition. And as we know from plot 3.1 above, the condition averages across the six time bins selected range from near zero on the hazard probability scale to 0.50. And in principle, the more general case could involve any hazard value from zero to 1. Therefore, if we try to stick to a weakly informative approach to priors see here for more details, then priors for the cloglog values should cover a fairly broad range.

For a look at values across the cloglog scale and how it compares to the logit scale, see here.

Ok, so let’s take a look at some cloglog values at the lower and upper end of the probability scale.

cloglog_vals = clogloglink(c(0.01,0.99))
cloglog_vals
## [1] -4.600149  1.527180
# -4.600149  1.527180

and now take a look at some distributions

visualize("normal(0, 2)", "normal(0, 1)", "normal(0, 0.5)",
          xlim = c(-4, 4))

(0,2) looks like it covers all of our likely values, even if it is too broad.

For simplicity, we use these normal(0,2) priors for the mean per condition. However, if we were doing this for real, we would probably choose different priors that are similar to ones that were discussed in Tutorial_2a, as well as in Supplementary Materials.

set priors

We set the prior for the mean as (normal(0,2)) and then we use some default recommendations from McElreath 2020 for the rest.

priors <- c(
  set_prior("normal(0, 2)", class = "b"),
  set_prior("normal(0, 1)", class = "sd"),
  set_prior("lkj(2)", class = "cor")
)

run the model

This model takes ~17 minutes to build on a 2020 MacBook Pro (2 GHz Quad-Core Intel Core i5). For this tutorial, this chunk is skipped to save time. Change eval=TRUE in the code chunk to run this model.

plan(multicore)
bi <- brm(formula = formula,
        data = data, family = bernoulli(link = "cloglog"),
        prior = priors,
        iter = 2000, warmup = 1000, cores = 8, chains = 4,
        control = list(adapt_delta = 0.95),
        save_pars = save_pars(all=TRUE),
        seed = 123,
        init = 0.01,
        file = "Tutorial_4_planning/models/bi")
summary(bi)

The model built without any concerns, errors or warnings.

At this point, you would normally do the following:

  • check the model by looking at the convergence of chains and other model diagnostics. We’ll skip this step for now, just to move on to the main focus of planning and data simulation.

  • Summarise and visualise the parameter estimates and/or posterior predictions.

Below, we quickly summarise fixed effects from the model in cloglog values, and we also convert them to hazard values, which tend to be far easier to interpret.

visualise fixed effect parameters from model bi

read in the existing model object (if you did not build the model yourself)

bi <- readRDS("Tutorial_4_planning/models/bi.rds")
summary(bi)
##  Family: bernoulli 
##   Links: mu = cloglog 
## Formula: outcome ~ 0 + timebin:prime + (0 + timebin:prime | pid) 
##    Data: data (Number of observations: 7385) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Multilevel Hyperparameters:
## ~pid (Number of levels: 6) 
##                                          Estimate Est.Error l-95% CI u-95% CI
## sd(timebin4:primecon)                        1.06      0.53     0.15     2.24
## sd(timebin5:primecon)                        1.34      0.44     0.66     2.35
## sd(timebin6:primecon)                        0.60      0.26     0.25     1.26
## sd(timebin7:primecon)                        0.47      0.21     0.19     0.98
## sd(timebin8:primecon)                        0.58      0.25     0.24     1.24
## sd(timebin9:primecon)                        0.72      0.28     0.34     1.41
## sd(timebin4:primeinc)                        1.17      0.52     0.31     2.37
## sd(timebin5:primeinc)                        1.05      0.41     0.46     2.06
## sd(timebin6:primeinc)                        0.91      0.32     0.45     1.67
## sd(timebin7:primeinc)                        1.47      0.43     0.77     2.44
## sd(timebin8:primeinc)                        1.34      0.43     0.67     2.33
## sd(timebin9:primeinc)                        0.62      0.32     0.12     1.38
## cor(timebin4:primecon,timebin5:primecon)     0.18      0.24    -0.30     0.60
## cor(timebin4:primecon,timebin6:primecon)     0.14      0.25    -0.35     0.59
## cor(timebin5:primecon,timebin6:primecon)     0.19      0.23    -0.27     0.63
## cor(timebin4:primecon,timebin7:primecon)     0.15      0.24    -0.33     0.60
## cor(timebin5:primecon,timebin7:primecon)     0.15      0.24    -0.32     0.60
## cor(timebin6:primecon,timebin7:primecon)     0.16      0.24    -0.32     0.61
## cor(timebin4:primecon,timebin8:primecon)     0.06      0.25    -0.41     0.53
## cor(timebin5:primecon,timebin8:primecon)     0.00      0.23    -0.44     0.45
## cor(timebin6:primecon,timebin8:primecon)     0.03      0.24    -0.44     0.49
## cor(timebin7:primecon,timebin8:primecon)     0.08      0.24    -0.40     0.54
## cor(timebin4:primecon,timebin9:primecon)     0.08      0.25    -0.41     0.55
## cor(timebin5:primecon,timebin9:primecon)     0.09      0.23    -0.38     0.52
## cor(timebin6:primecon,timebin9:primecon)     0.05      0.24    -0.43     0.51
## cor(timebin7:primecon,timebin9:primecon)     0.05      0.24    -0.41     0.52
## cor(timebin8:primecon,timebin9:primecon)     0.18      0.24    -0.31     0.62
## cor(timebin4:primecon,timebin4:primeinc)     0.16      0.25    -0.34     0.61
## cor(timebin5:primecon,timebin4:primeinc)     0.22      0.23    -0.24     0.65
## cor(timebin6:primecon,timebin4:primeinc)     0.17      0.24    -0.32     0.61
## cor(timebin7:primecon,timebin4:primeinc)     0.15      0.24    -0.31     0.60
## cor(timebin8:primecon,timebin4:primeinc)     0.04      0.24    -0.42     0.49
## cor(timebin9:primecon,timebin4:primeinc)     0.08      0.24    -0.39     0.54
## cor(timebin4:primecon,timebin5:primeinc)     0.15      0.24    -0.33     0.60
## cor(timebin5:primecon,timebin5:primeinc)     0.22      0.23    -0.25     0.65
## cor(timebin6:primecon,timebin5:primeinc)     0.16      0.24    -0.31     0.61
## cor(timebin7:primecon,timebin5:primeinc)     0.16      0.25    -0.32     0.61
## cor(timebin8:primecon,timebin5:primeinc)     0.04      0.24    -0.43     0.48
## cor(timebin9:primecon,timebin5:primeinc)     0.04      0.24    -0.44     0.50
## cor(timebin4:primeinc,timebin5:primeinc)     0.17      0.24    -0.31     0.62
## cor(timebin4:primecon,timebin6:primeinc)     0.13      0.24    -0.34     0.58
## cor(timebin5:primecon,timebin6:primeinc)     0.15      0.23    -0.31     0.57
## cor(timebin6:primecon,timebin6:primeinc)     0.19      0.24    -0.29     0.64
## cor(timebin7:primecon,timebin6:primeinc)     0.18      0.24    -0.30     0.62
## cor(timebin8:primecon,timebin6:primeinc)     0.07      0.23    -0.39     0.52
## cor(timebin9:primecon,timebin6:primeinc)     0.01      0.23    -0.43     0.47
## cor(timebin4:primeinc,timebin6:primeinc)     0.15      0.23    -0.31     0.58
## cor(timebin5:primeinc,timebin6:primeinc)     0.16      0.23    -0.30     0.60
## cor(timebin4:primecon,timebin7:primeinc)     0.09      0.24    -0.39     0.53
## cor(timebin5:primecon,timebin7:primeinc)     0.08      0.22    -0.35     0.50
## cor(timebin6:primecon,timebin7:primeinc)     0.14      0.23    -0.30     0.58
## cor(timebin7:primecon,timebin7:primeinc)     0.14      0.23    -0.31     0.57
## cor(timebin8:primecon,timebin7:primeinc)     0.20      0.23    -0.25     0.62
## cor(timebin9:primecon,timebin7:primeinc)     0.11      0.23    -0.35     0.55
## cor(timebin4:primeinc,timebin7:primeinc)     0.10      0.24    -0.37     0.53
## cor(timebin5:primeinc,timebin7:primeinc)     0.11      0.23    -0.33     0.55
## cor(timebin6:primeinc,timebin7:primeinc)     0.21      0.23    -0.25     0.63
## cor(timebin4:primecon,timebin8:primeinc)     0.07      0.24    -0.40     0.54
## cor(timebin5:primecon,timebin8:primeinc)     0.03      0.22    -0.40     0.46
## cor(timebin6:primecon,timebin8:primeinc)     0.09      0.23    -0.36     0.53
## cor(timebin7:primecon,timebin8:primeinc)     0.14      0.24    -0.32     0.57
## cor(timebin8:primecon,timebin8:primeinc)     0.17      0.23    -0.31     0.59
## cor(timebin9:primecon,timebin8:primeinc)     0.01      0.23    -0.43     0.47
## cor(timebin4:primeinc,timebin8:primeinc)     0.05      0.24    -0.41     0.50
## cor(timebin5:primeinc,timebin8:primeinc)     0.11      0.23    -0.33     0.54
## cor(timebin6:primeinc,timebin8:primeinc)     0.19      0.23    -0.27     0.61
## cor(timebin7:primeinc,timebin8:primeinc)     0.25      0.22    -0.19     0.65
## cor(timebin4:primecon,timebin9:primeinc)     0.06      0.25    -0.44     0.54
## cor(timebin5:primecon,timebin9:primeinc)     0.04      0.24    -0.42     0.48
## cor(timebin6:primecon,timebin9:primeinc)     0.01      0.24    -0.46     0.50
## cor(timebin7:primecon,timebin9:primeinc)     0.04      0.24    -0.44     0.50
## cor(timebin8:primecon,timebin9:primeinc)     0.09      0.25    -0.40     0.54
## cor(timebin9:primecon,timebin9:primeinc)     0.15      0.25    -0.35     0.61
## cor(timebin4:primeinc,timebin9:primeinc)     0.05      0.25    -0.46     0.51
## cor(timebin5:primeinc,timebin9:primeinc)    -0.01      0.24    -0.47     0.47
## cor(timebin6:primeinc,timebin9:primeinc)    -0.01      0.24    -0.48     0.45
## cor(timebin7:primeinc,timebin9:primeinc)    -0.01      0.24    -0.47     0.44
## cor(timebin8:primeinc,timebin9:primeinc)    -0.04      0.24    -0.50     0.43
##                                          Rhat Bulk_ESS Tail_ESS
## sd(timebin4:primecon)                    1.00     1575      946
## sd(timebin5:primecon)                    1.00     2481     2988
## sd(timebin6:primecon)                    1.00     2305     3263
## sd(timebin7:primecon)                    1.00     2073     2563
## sd(timebin8:primecon)                    1.00     2334     2939
## sd(timebin9:primecon)                    1.00     2575     3003
## sd(timebin4:primeinc)                    1.00     2200     1735
## sd(timebin5:primeinc)                    1.00     3162     2990
## sd(timebin6:primeinc)                    1.00     2699     3051
## sd(timebin7:primeinc)                    1.00     2835     2542
## sd(timebin8:primeinc)                    1.00     3978     3135
## sd(timebin9:primeinc)                    1.00     1920     1493
## cor(timebin4:primecon,timebin5:primecon) 1.00     2579     2755
## cor(timebin4:primecon,timebin6:primecon) 1.00     3235     2896
## cor(timebin5:primecon,timebin6:primecon) 1.00     3775     3240
## cor(timebin4:primecon,timebin7:primecon) 1.00     3403     2959
## cor(timebin5:primecon,timebin7:primecon) 1.00     3891     3211
## cor(timebin6:primecon,timebin7:primecon) 1.00     3464     3294
## cor(timebin4:primecon,timebin8:primecon) 1.00     2597     2992
## cor(timebin5:primecon,timebin8:primecon) 1.00     3771     3027
## cor(timebin6:primecon,timebin8:primecon) 1.00     3425     3031
## cor(timebin7:primecon,timebin8:primecon) 1.00     3417     3089
## cor(timebin4:primecon,timebin9:primecon) 1.00     3508     3062
## cor(timebin5:primecon,timebin9:primecon) 1.00     3333     3269
## cor(timebin6:primecon,timebin9:primecon) 1.00     3878     3057
## cor(timebin7:primecon,timebin9:primecon) 1.00     3440     3278
## cor(timebin8:primecon,timebin9:primecon) 1.00     3203     2802
## cor(timebin4:primecon,timebin4:primeinc) 1.00     3562     2973
## cor(timebin5:primecon,timebin4:primeinc) 1.00     3666     2937
## cor(timebin6:primecon,timebin4:primeinc) 1.00     4437     3223
## cor(timebin7:primecon,timebin4:primeinc) 1.00     3549     3408
## cor(timebin8:primecon,timebin4:primeinc) 1.00     4029     3557
## cor(timebin9:primecon,timebin4:primeinc) 1.00     4154     3375
## cor(timebin4:primecon,timebin5:primeinc) 1.00     3651     3170
## cor(timebin5:primecon,timebin5:primeinc) 1.00     5235     3550
## cor(timebin6:primecon,timebin5:primeinc) 1.00     4083     3353
## cor(timebin7:primecon,timebin5:primeinc) 1.00     3654     3361
## cor(timebin8:primecon,timebin5:primeinc) 1.00     3569     3243
## cor(timebin9:primecon,timebin5:primeinc) 1.00     4171     3303
## cor(timebin4:primeinc,timebin5:primeinc) 1.00     3555     3378
## cor(timebin4:primecon,timebin6:primeinc) 1.00     3405     3068
## cor(timebin5:primecon,timebin6:primeinc) 1.00     4233     3473
## cor(timebin6:primecon,timebin6:primeinc) 1.00     3645     3073
## cor(timebin7:primecon,timebin6:primeinc) 1.00     3966     3441
## cor(timebin8:primecon,timebin6:primeinc) 1.00     3942     3390
## cor(timebin9:primecon,timebin6:primeinc) 1.00     3490     3451
## cor(timebin4:primeinc,timebin6:primeinc) 1.00     3394     3450
## cor(timebin5:primeinc,timebin6:primeinc) 1.00     3364     3127
## cor(timebin4:primecon,timebin7:primeinc) 1.00     2789     2925
## cor(timebin5:primecon,timebin7:primeinc) 1.00     3968     3263
## cor(timebin6:primecon,timebin7:primeinc) 1.00     3561     3246
## cor(timebin7:primecon,timebin7:primeinc) 1.00     3904     3129
## cor(timebin8:primecon,timebin7:primeinc) 1.00     4104     3483
## cor(timebin9:primecon,timebin7:primeinc) 1.00     3650     3046
## cor(timebin4:primeinc,timebin7:primeinc) 1.00     3065     3355
## cor(timebin5:primeinc,timebin7:primeinc) 1.00     3444     3085
## cor(timebin6:primeinc,timebin7:primeinc) 1.00     3049     3637
## cor(timebin4:primecon,timebin8:primeinc) 1.00     3448     3249
## cor(timebin5:primecon,timebin8:primeinc) 1.00     4224     2985
## cor(timebin6:primecon,timebin8:primeinc) 1.00     3937     3479
## cor(timebin7:primecon,timebin8:primeinc) 1.00     3616     3515
## cor(timebin8:primecon,timebin8:primeinc) 1.00     3591     3346
## cor(timebin9:primecon,timebin8:primeinc) 1.00     4010     3616
## cor(timebin4:primeinc,timebin8:primeinc) 1.00     2563     2990
## cor(timebin5:primeinc,timebin8:primeinc) 1.00     3669     3445
## cor(timebin6:primeinc,timebin8:primeinc) 1.00     3298     3284
## cor(timebin7:primeinc,timebin8:primeinc) 1.00     2613     3550
## cor(timebin4:primecon,timebin9:primeinc) 1.00     4337     3141
## cor(timebin5:primecon,timebin9:primeinc) 1.00     4579     3537
## cor(timebin6:primecon,timebin9:primeinc) 1.00     4460     3489
## cor(timebin7:primecon,timebin9:primeinc) 1.00     3474     3310
## cor(timebin8:primecon,timebin9:primeinc) 1.00     3563     3130
## cor(timebin9:primecon,timebin9:primeinc) 1.00     4193     3345
## cor(timebin4:primeinc,timebin9:primeinc) 1.00     3087     3390
## cor(timebin5:primeinc,timebin9:primeinc) 1.00     3446     3189
## cor(timebin6:primeinc,timebin9:primeinc) 1.00     3237     3300
## cor(timebin7:primeinc,timebin9:primeinc) 1.00     3336     3373
## cor(timebin8:primeinc,timebin9:primeinc) 1.00     3066     3561
## 
## Regression Coefficients:
##                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## timebin4:primecon    -4.66      0.63    -5.96    -3.44 1.00     2425     2603
## timebin5:primecon    -3.24      0.61    -4.43    -2.01 1.00     1680     1803
## timebin6:primecon    -2.32      0.30    -2.93    -1.72 1.00     2041     2286
## timebin7:primecon    -1.49      0.23    -1.96    -1.02 1.00     2030     1901
## timebin8:primecon    -1.21      0.28    -1.77    -0.63 1.00     1783     1971
## timebin9:primecon    -0.60      0.32    -1.23     0.07 1.00     1931     2216
## timebin4:primeinc    -4.37      0.61    -5.57    -3.13 1.00     2868     2386
## timebin5:primeinc    -3.21      0.50    -4.18    -2.19 1.00     2349     2460
## timebin6:primeinc    -2.37      0.41    -3.12    -1.48 1.00     2078     2205
## timebin7:primeinc    -2.27      0.64    -3.56    -0.99 1.00     2088     2403
## timebin8:primeinc    -3.04      0.59    -4.14    -1.80 1.00     2662     2729
## timebin9:primeinc    -2.45      0.32    -3.11    -1.79 1.00     3116     2732
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

wrangle the posterior distribution

## take the posterior draws
post <- as_draws_df(bi) %>% 
  select(-lp__) %>% 
  as_tibble()

## create a summary, and use medians with robust = TRUE
post_summary <- posterior_summary(bi, robust = TRUE)

## just look at the fixed effects
post_qi_b <- post %>%
  select(starts_with("b_")) %>%
  pivot_longer(everything()) %>%
  group_by(name) %>% 
  median_qi(value)
head(post_qi_b)
## # A tibble: 6 × 7
##   name                value .lower .upper .width .point .interval
##   <chr>               <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1 b_timebin4:primecon -4.63  -5.96  -3.44   0.95 median qi       
## 2 b_timebin4:primeinc -4.35  -5.57  -3.13   0.95 median qi       
## 3 b_timebin5:primecon -3.24  -4.43  -2.01   0.95 median qi       
## 4 b_timebin5:primeinc -3.21  -4.18  -2.19   0.95 median qi       
## 5 b_timebin6:primecon -2.32  -2.93  -1.72   0.95 median qi       
## 6 b_timebin6:primeinc -2.38  -3.12  -1.48   0.95 median qi

a quick plot using posterior samples and tidybayes

tidy_fixed <- post %>% 
  select(starts_with("b_"), .chain, .iteration, .draw) %>% # select and rename in simpler labels.
  rename(chain=.chain, iter=.iteration, draw=.draw) %>% 
  pivot_longer(-c(chain, draw, iter)) %>% # move from wide to long
  mutate(key = factor(name, levels=unique(name)),
         bin = rep(1:6, length.out = 48000),
         prime = rep(c("con", "inc"), each = 6, length.out = 48000),
         bin=factor(bin),
         prime=factor(prime))
head(tidy_fixed)
## # A tibble: 6 × 8
##   chain  iter  draw name                 value key                 bin   prime
##   <int> <int> <int> <chr>                <dbl> <fct>               <fct> <fct>
## 1     1     1     1 b_timebin4:primecon -6.09  b_timebin4:primecon 1     con  
## 2     1     1     1 b_timebin5:primecon -2.32  b_timebin5:primecon 2     con  
## 3     1     1     1 b_timebin6:primecon -2.78  b_timebin6:primecon 3     con  
## 4     1     1     1 b_timebin7:primecon -1.40  b_timebin7:primecon 4     con  
## 5     1     1     1 b_timebin8:primecon -1.32  b_timebin8:primecon 5     con  
## 6     1     1     1 b_timebin9:primecon -0.472 b_timebin9:primecon 6     con
tail(tidy_fixed)
## # A tibble: 6 × 8
##   chain  iter  draw name                value key                 bin   prime
##   <int> <int> <int> <chr>               <dbl> <fct>               <fct> <fct>
## 1     4  1000  4000 b_timebin4:primeinc -4.50 b_timebin4:primeinc 1     inc  
## 2     4  1000  4000 b_timebin5:primeinc -3.07 b_timebin5:primeinc 2     inc  
## 3     4  1000  4000 b_timebin6:primeinc -1.23 b_timebin6:primeinc 3     inc  
## 4     4  1000  4000 b_timebin7:primeinc -2.26 b_timebin7:primeinc 4     inc  
## 5     4  1000  4000 b_timebin8:primeinc -3.07 b_timebin8:primeinc 5     inc  
## 6     4  1000  4000 b_timebin9:primeinc -2.34 b_timebin9:primeinc 6     inc
check.labels <- tidy_fixed %>% 
  distinct(key, bin, prime) 
check.labels
## # A tibble: 12 × 3
##    key                 bin   prime
##    <fct>               <fct> <fct>
##  1 b_timebin4:primecon 1     con  
##  2 b_timebin5:primecon 2     con  
##  3 b_timebin6:primecon 3     con  
##  4 b_timebin7:primecon 4     con  
##  5 b_timebin8:primecon 5     con  
##  6 b_timebin9:primecon 6     con  
##  7 b_timebin4:primeinc 1     inc  
##  8 b_timebin5:primeinc 2     inc  
##  9 b_timebin6:primeinc 3     inc  
## 10 b_timebin7:primeinc 4     inc  
## 11 b_timebin8:primeinc 5     inc  
## 12 b_timebin9:primeinc 6     inc
# plot
p4.1 <- ggplot(tidy_fixed, aes(x = bin, y = value, fill=prime)) +  
  stat_halfeye(alpha=0.7) +
  labs(title = "Model bi cloglog",
       x = "time bin", y = "cloglog") +
  scale_fill_brewer(palette = "Dark2") 
p4.1

ggsave ("Tutorial_4_planning/figures/index_cloglog.jpeg",
        width = 6, height = 4, dpi = 800)

calculate hazards per condition per bin in the posterior dist

tidy_haz <- tidy_fixed %>%
  select(chain, iter, draw, bin, prime, value) %>% 
  mutate(hazard = exp(value))
head(tidy_haz)
## # A tibble: 6 × 7
##   chain  iter  draw bin   prime  value  hazard
##   <int> <int> <int> <fct> <fct>  <dbl>   <dbl>
## 1     1     1     1 1     con   -6.09  0.00227
## 2     1     1     1 2     con   -2.32  0.0981 
## 3     1     1     1 3     con   -2.78  0.0619 
## 4     1     1     1 4     con   -1.40  0.247  
## 5     1     1     1 5     con   -1.32  0.266  
## 6     1     1     1 6     con   -0.472 0.624

plot hazards

p4.2 <- ggplot(tidy_haz, aes(x = bin, y = hazard,
                                    fill=prime)) +  
  stat_halfeye(alpha=0.7) +
  labs(title = "hazard estimates",
       x = "time bin", y = "hazard") +
  scale_fill_brewer(palette = "Dark2") +
  scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) 
p4.2
## Warning: Removed 166 rows containing missing values or values outside the scale range
## (`stat_slabinterval()`).

ggsave ("Tutorial_4_planning/figures/index_hazard.jpeg",
        width = 6, height = 4, dpi = 800)

warning about removing rows.

5. Simulate a single dataset

Note: For multi-level data, data simulation can be quite a convoluted process, and we therefore strongly recommend reading the power blogs by Solomon Kurz and working through the faux package vignettes, before getting stuck into the below code.

read in a prior model, if not already loaded.

bi <- readRDS("Tutorial_4_planning/models/bi.rds")
summary(bi)

Define some parameters

# define parameters
# specify some features of the design
subj_n = 10  # number of subjects
rep_n = 200 # number of trial repeats 
cond_n = 2 # number of conditions
bin_n = 6 # number of time bins

## set fixed effects
## as a reminder
fixef <- as_tibble(fixef(bi), rownames = "term") %>%
  mutate(across(where(is.double), \(x) round(x, 2)))
fixef
## # A tibble: 12 × 5
##    term              Estimate Est.Error  Q2.5 Q97.5
##    <chr>                <dbl>     <dbl> <dbl> <dbl>
##  1 timebin4:primecon    -4.66      0.63 -5.96 -3.44
##  2 timebin5:primecon    -3.24      0.61 -4.43 -2.01
##  3 timebin6:primecon    -2.32      0.3  -2.93 -1.72
##  4 timebin7:primecon    -1.49      0.23 -1.96 -1.02
##  5 timebin8:primecon    -1.21      0.28 -1.77 -0.63
##  6 timebin9:primecon    -0.6       0.32 -1.23  0.07
##  7 timebin4:primeinc    -4.37      0.61 -5.57 -3.13
##  8 timebin5:primeinc    -3.21      0.5  -4.18 -2.19
##  9 timebin6:primeinc    -2.37      0.41 -3.12 -1.48
## 10 timebin7:primeinc    -2.27      0.64 -3.56 -0.99
## 11 timebin8:primeinc    -3.04      0.59 -4.14 -1.8 
## 12 timebin9:primeinc    -2.45      0.32 -3.11 -1.79
## b1 to b6 = time bin 1 to 6.
## c = con
b1c = -4.66
b2c = -3.24
b3c = -2.32
b4c = -1.49
b5c = -1.21
b6c = -0.60

## i = inc
b1i = -4.37
b2i = -3.21
b3i = -2.37
b4i = -2.27
b5i = -3.04
b6i = -2.45

## set varying effects by pid
## as a reminder
varcor <- VarCorr(bi)
glimpse(varcor)
## List of 1
##  $ pid:List of 3
##   ..$ sd : num [1:12, 1:4] 1.062 1.34 0.599 0.468 0.583 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   ..$ cor: num [1:12, 1:4, 1:12] 1 0.1811 0.1402 0.1506 0.0557 ...
##   .. ..- attr(*, "dimnames")=List of 3
##   ..$ cov: num [1:12, 1:4, 1:12] 1.4119 0.2562 0.0839 0.069 0.0335 ...
##   .. ..- attr(*, "dimnames")=List of 3
str(varcor)
## List of 1
##  $ pid:List of 3
##   ..$ sd : num [1:12, 1:4] 1.062 1.34 0.599 0.468 0.583 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:12] "timebin4:primecon" "timebin5:primecon" "timebin6:primecon" "timebin7:primecon" ...
##   .. .. ..$ : chr [1:4] "Estimate" "Est.Error" "Q2.5" "Q97.5"
##   ..$ cor: num [1:12, 1:4, 1:12] 1 0.1811 0.1402 0.1506 0.0557 ...
##   .. ..- attr(*, "dimnames")=List of 3
##   .. .. ..$ : chr [1:12] "timebin4:primecon" "timebin5:primecon" "timebin6:primecon" "timebin7:primecon" ...
##   .. .. ..$ : chr [1:4] "Estimate" "Est.Error" "Q2.5" "Q97.5"
##   .. .. ..$ : chr [1:12] "timebin4:primecon" "timebin5:primecon" "timebin6:primecon" "timebin7:primecon" ...
##   ..$ cov: num [1:12, 1:4, 1:12] 1.4119 0.2562 0.0839 0.069 0.0335 ...
##   .. ..- attr(*, "dimnames")=List of 3
##   .. .. ..$ : chr [1:12] "timebin4:primecon" "timebin5:primecon" "timebin6:primecon" "timebin7:primecon" ...
##   .. .. ..$ : chr [1:4] "Estimate" "Est.Error" "Q2.5" "Q97.5"
##   .. .. ..$ : chr [1:12] "timebin4:primecon" "timebin5:primecon" "timebin6:primecon" "timebin7:primecon" ...
## extract sd
sd <- as_tibble(varcor$pid$sd, rownames = "term") %>%
  mutate(across(where(is.double), \(x) round(x, 2)))
sd
## # A tibble: 12 × 5
##    term              Estimate Est.Error  Q2.5 Q97.5
##    <chr>                <dbl>     <dbl> <dbl> <dbl>
##  1 timebin4:primecon     1.06      0.53  0.15  2.24
##  2 timebin5:primecon     1.34      0.44  0.66  2.35
##  3 timebin6:primecon     0.6       0.26  0.25  1.26
##  4 timebin7:primecon     0.47      0.21  0.19  0.98
##  5 timebin8:primecon     0.58      0.25  0.24  1.24
##  6 timebin9:primecon     0.72      0.28  0.34  1.41
##  7 timebin4:primeinc     1.17      0.52  0.31  2.37
##  8 timebin5:primeinc     1.05      0.41  0.46  2.06
##  9 timebin6:primeinc     0.91      0.32  0.45  1.67
## 10 timebin7:primeinc     1.47      0.43  0.77  2.44
## 11 timebin8:primeinc     1.34      0.43  0.67  2.33
## 12 timebin9:primeinc     0.62      0.32  0.12  1.38
u1c_sd = 1.06   # 
u2c_sd = 1.34   # 
u3c_sd = 0.60   # 
u4c_sd = 0.47   # 
u5c_sd = 0.58   #
u6c_sd = 0.72   # 

u1i_sd = 1.17   # 
u2i_sd = 1.05   # 
u3i_sd = 0.91   # 
u4i_sd = 1.47   # 
u5i_sd = 1.34   #
u6i_sd = 0.62   # 

make a correlation matrix

# varcor <- VarCorr(b3c_out)
# varcor
# str(varcor)

## take each one
sd <- varcor$pid$sd
cor <- varcor$pid$cor
cov <- varcor$pid$cov

## make it tidy
cor_mat_tidy <- as_tibble(cor, rownames = "term") %>% ## 
  rename_with(~str_replace_all(.x, '\\.', '_')) %>% 
  pivot_longer(-term,
               values_to = "value",
               names_to = c("parameter", "term2"),
               names_pattern = "([a-zA-Z_\\d?]+)_([a-zA-Z:\\d?]+)") %>% ## pivot longer and use names_pattern to select the parts of the column names to separate on. the above is regex for each bit between the separator ("-"). e.g., ([a-zA-Z]+\\d+). this looks for letters and digits before the first "_". This (\\d?) looks for the possibility of digits.
  ## the rest of this code just turns the variables into what we want
  filter(parameter == "Estimate") %>%
  select(-parameter) %>% 
  pivot_wider(names_from = "term2",
              values_from = "value") %>% 
  select(-term)
head(cor_mat_tidy)
## # A tibble: 6 × 12
##   `timebin4:primecon` `timebin5:primecon` `timebin6:primecon`
##                 <dbl>               <dbl>               <dbl>
## 1              1                  0.181                0.140 
## 2              0.181              1                    0.194 
## 3              0.140              0.194                1     
## 4              0.151              0.152                0.160 
## 5              0.0557             0.00218              0.0334
## 6              0.0784             0.0865               0.0480
## # ℹ 9 more variables: `timebin7:primecon` <dbl>, `timebin8:primecon` <dbl>,
## #   `timebin9:primecon` <dbl>, `timebin4:primeinc` <dbl>,
## #   `timebin5:primeinc` <dbl>, `timebin6:primeinc` <dbl>,
## #   `timebin7:primeinc` <dbl>, `timebin8:primeinc` <dbl>,
## #   `timebin9:primeinc` <dbl>
str(cor_mat_tidy)
## tibble [12 × 12] (S3: tbl_df/tbl/data.frame)
##  $ timebin4:primecon: num [1:12] 1 0.1811 0.1402 0.1506 0.0557 ...
##  $ timebin5:primecon: num [1:12] 0.18111 1 0.19399 0.15233 0.00218 ...
##  $ timebin6:primecon: num [1:12] 0.1402 0.194 1 0.1605 0.0334 ...
##  $ timebin7:primecon: num [1:12] 0.1506 0.1523 0.1605 1 0.0786 ...
##  $ timebin8:primecon: num [1:12] 0.05571 0.00218 0.03338 0.07859 1 ...
##  $ timebin9:primecon: num [1:12] 0.0784 0.0865 0.048 0.0548 0.181 ...
##  $ timebin4:primeinc: num [1:12] 0.157 0.2193 0.169 0.1509 0.0387 ...
##  $ timebin5:primeinc: num [1:12] 0.1534 0.2218 0.1637 0.1635 0.0405 ...
##  $ timebin6:primeinc: num [1:12] 0.1287 0.1457 0.1891 0.1826 0.0658 ...
##  $ timebin7:primeinc: num [1:12] 0.0853 0.0849 0.1443 0.143 0.2001 ...
##  $ timebin8:primeinc: num [1:12] 0.0739 0.031 0.0924 0.1406 0.1671 ...
##  $ timebin9:primeinc: num [1:12] 0.06086 0.03899 0.00569 0.04339 0.08991 ...
## check with the model summary
# summary(bi)

## remove names and make a matrix as faux{} likes it like that
## full cor mat
cor_mat <- unname(as.matrix(cor_mat_tidy))
cor_mat
##             [,1]        [,2]        [,3]       [,4]        [,5]       [,6]
##  [1,] 1.00000000 0.181111610 0.140227052 0.15064403 0.055708399 0.07844122
##  [2,] 0.18111161 1.000000000 0.193985724 0.15232674 0.002180379 0.08649338
##  [3,] 0.14022705 0.193985724 1.000000000 0.16047883 0.033383923 0.04798801
##  [4,] 0.15064403 0.152326739 0.160478829 1.00000000 0.078594351 0.05482625
##  [5,] 0.05570840 0.002180379 0.033383923 0.07859435 1.000000000 0.18103101
##  [6,] 0.07844122 0.086493375 0.047988009 0.05482625 0.181031007 1.00000000
##  [7,] 0.15700680 0.219264583 0.168973923 0.15088219 0.038736009 0.08194166
##  [8,] 0.15340135 0.221750842 0.163709105 0.16348048 0.040533384 0.03581839
##  [9,] 0.12869470 0.145655326 0.189094128 0.18261210 0.065750253 0.01304757
## [10,] 0.08533370 0.084865176 0.144296210 0.14300113 0.200118561 0.11201101
## [11,] 0.07387438 0.030988328 0.092439132 0.14063153 0.167079452 0.01372982
## [12,] 0.06086174 0.038991659 0.005693736 0.04338987 0.089912956 0.15443839
##             [,7]        [,8]        [,9]       [,10]       [,11]        [,12]
##  [1,] 0.15700680  0.15340135  0.12869470  0.08533370  0.07387438  0.060861737
##  [2,] 0.21926458  0.22175084  0.14565533  0.08486518  0.03098833  0.038991659
##  [3,] 0.16897392  0.16370910  0.18909413  0.14429621  0.09243913  0.005693736
##  [4,] 0.15088219  0.16348048  0.18261210  0.14300113  0.14063153  0.043389867
##  [5,] 0.03873601  0.04053338  0.06575025  0.20011856  0.16707945  0.089912956
##  [6,] 0.08194166  0.03581839  0.01304757  0.11201101  0.01372982  0.154438395
##  [7,] 1.00000000  0.17417672  0.15048835  0.09722597  0.04902133  0.047558257
##  [8,] 0.17417672  1.00000000  0.15883356  0.11194773  0.11333808 -0.011738558
##  [9,] 0.15048835  0.15883356  1.00000000  0.21397253  0.18646605 -0.014087671
## [10,] 0.09722597  0.11194773  0.21397253  1.00000000  0.25374893 -0.012638958
## [11,] 0.04902133  0.11333808  0.18646605  0.25374893  1.00000000 -0.039673970
## [12,] 0.04755826 -0.01173856 -0.01408767 -0.01263896 -0.03967397  1.000000000

setup the data structure

# make it reproducible
set.seed(1)

d1 <- add_random(subj = subj_n, rep = rep_n) %>%
  add_within("subj", condition = c("cond1", "cond2")) %>%
  add_contrast("condition", "treatment", add_cols = TRUE, 
               colnames = c("cond")) %>%
  add_within("rep", bin = 1:bin_n) %>%
  ### create new conditions here that respect the twelve levels b1c, b1i etc.
  mutate(con = if_else(condition == "cond1", 1, 0),
         inc = if_else(condition == "cond2", 1, 0)) %>% 
  mutate(bin1 = if_else(bin == 1, 1, 0),
         bin2 = if_else(bin == 2, 1, 0),
         bin3 = if_else(bin == 3, 1, 0),
         bin4 = if_else(bin == 4, 1, 0),
         bin5 = if_else(bin == 5, 1, 0),
         bin6 = if_else(bin == 6, 1, 0)) %>% 
  # add random effects 
  add_ranef("subj", u1c = u1c_sd, u2c = u2c_sd, u3c = u3c_sd, u4c = u4c_sd,
            u5c = u5c_sd, u6c = u6c_sd,
            u1i = u1i_sd, u2i = u2i_sd, u3i = u3i_sd, u4i = u4i_sd,
            u5i = u5i_sd, u6i = u6i_sd, 
            .cors = cor_mat) %>% 
  # calculate logit
  mutate(cloglog = (b1c + u1c) * con * bin1 + (b2c + u2c) * con * bin2 + 
           (b3c + u3c) * con * bin3 + (b4c + u4c) * con * bin4  + 
           (b5c + u5c) * con * bin5 + (b6c + u6c) * con * bin6 +
           (b1i + u1i) * inc * bin1 + (b2i + u2i) * inc * bin2 + 
           (b3i + u3i) * inc * bin3 + (b4i + u4i) * inc * bin4  + 
           (b5i + u5i) * inc * bin5 + (b6i + u6i) * inc * bin6) %>% 
  # calculate inverse cloglog
  mutate(prob = clogloglink(cloglog, inverse = TRUE)) %>%  
  # calculate event
  mutate(event = rbinom(n(), 1, prob)) 
head(d1)
## # A tibble: 6 × 28
##   subj   rep    condition  cond   bin   con   inc  bin1  bin2  bin3  bin4  bin5
##   <chr>  <chr>  <fct>     <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 subj01 rep001 cond1         0     1     1     0     1     0     0     0     0
## 2 subj01 rep001 cond1         0     2     1     0     0     1     0     0     0
## 3 subj01 rep001 cond1         0     3     1     0     0     0     1     0     0
## 4 subj01 rep001 cond1         0     4     1     0     0     0     0     1     0
## 5 subj01 rep001 cond1         0     5     1     0     0     0     0     0     1
## 6 subj01 rep001 cond1         0     6     1     0     0     0     0     0     0
## # ℹ 16 more variables: bin6 <dbl>, u1c <dbl>, u2c <dbl>, u3c <dbl>, u4c <dbl>,
## #   u5c <dbl>, u6c <dbl>, u1i <dbl>, u2i <dbl>, u3i <dbl>, u4i <dbl>,
## #   u5i <dbl>, u6i <dbl>, cloglog <dbl>, prob <dbl>, event <int>
str(d1)
## tibble [24,000 × 28] (S3: tbl_df/tbl/data.frame)
##  $ subj     : chr [1:24000] "subj01" "subj01" "subj01" "subj01" ...
##  $ rep      : chr [1:24000] "rep001" "rep001" "rep001" "rep001" ...
##  $ condition: Factor w/ 2 levels "cond1","cond2": 1 1 1 1 1 1 2 2 2 2 ...
##   ..- attr(*, "contrasts")= num [1:2, 1] 0 1
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:2] "cond1" "cond2"
##   .. .. ..$ : chr "cond"
##  $ cond     : num [1:24000] 0 0 0 0 0 0 1 1 1 1 ...
##  $ bin      : int [1:24000] 1 2 3 4 5 6 1 2 3 4 ...
##  $ con      : num [1:24000] 1 1 1 1 1 1 0 0 0 0 ...
##  $ inc      : num [1:24000] 0 0 0 0 0 0 1 1 1 1 ...
##  $ bin1     : num [1:24000] 1 0 0 0 0 0 1 0 0 0 ...
##  $ bin2     : num [1:24000] 0 1 0 0 0 0 0 1 0 0 ...
##  $ bin3     : num [1:24000] 0 0 1 0 0 0 0 0 1 0 ...
##  $ bin4     : num [1:24000] 0 0 0 1 0 0 0 0 0 1 ...
##  $ bin5     : num [1:24000] 0 0 0 0 1 0 0 0 0 0 ...
##  $ bin6     : num [1:24000] 0 0 0 0 0 1 0 0 0 0 ...
##  $ u1c      : num [1:24000] 0.361 0.361 0.361 0.361 0.361 ...
##  $ u2c      : num [1:24000] 2.76 2.76 2.76 2.76 2.76 ...
##  $ u3c      : num [1:24000] 0.0637 0.0637 0.0637 0.0637 0.0637 ...
##  $ u4c      : num [1:24000] -0.114 -0.114 -0.114 -0.114 -0.114 ...
##  $ u5c      : num [1:24000] -0.491 -0.491 -0.491 -0.491 -0.491 ...
##  $ u6c      : num [1:24000] -0.6 -0.6 -0.6 -0.6 -0.6 ...
##  $ u1i      : num [1:24000] -0.12 -0.12 -0.12 -0.12 -0.12 ...
##  $ u2i      : num [1:24000] -0.0433 -0.0433 -0.0433 -0.0433 -0.0433 ...
##  $ u3i      : num [1:24000] 2.13 2.13 2.13 2.13 2.13 ...
##  $ u4i      : num [1:24000] -0.934 -0.934 -0.934 -0.934 -0.934 ...
##  $ u5i      : num [1:24000] 0.507 0.507 0.507 0.507 0.507 ...
##  $ u6i      : num [1:24000] 0.155 0.155 0.155 0.155 0.155 ...
##  $ cloglog  : num [1:24000] -4.299 -0.479 -2.256 -1.604 -1.701 ...
##  $ prob     : num [1:24000] 0.0135 0.4619 0.0994 0.1822 0.1668 ...
##  $ event    : int [1:24000] 0 1 1 0 0 0 0 0 1 0 ...
summary(d1)
##      subj               rep            condition          cond    
##  Length:24000       Length:24000       cond1:12000   Min.   :0.0  
##  Class :character   Class :character   cond2:12000   1st Qu.:0.0  
##  Mode  :character   Mode  :character                 Median :0.5  
##                                                      Mean   :0.5  
##                                                      3rd Qu.:1.0  
##                                                      Max.   :1.0  
##       bin           con           inc           bin1             bin2       
##  Min.   :1.0   Min.   :0.0   Min.   :0.0   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:2.0   1st Qu.:0.0   1st Qu.:0.0   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :3.5   Median :0.5   Median :0.5   Median :0.0000   Median :0.0000  
##  Mean   :3.5   Mean   :0.5   Mean   :0.5   Mean   :0.1667   Mean   :0.1667  
##  3rd Qu.:5.0   3rd Qu.:1.0   3rd Qu.:1.0   3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :6.0   Max.   :1.0   Max.   :1.0   Max.   :1.0000   Max.   :1.0000  
##       bin3             bin4             bin5             bin6       
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.1667   Mean   :0.1667   Mean   :0.1667   Mean   :0.1667  
##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##       u1c                u2c               u3c                u4c          
##  Min.   :-1.48460   Min.   :-3.4790   Min.   :-0.80094   Min.   :-0.50119  
##  1st Qu.:-0.08369   1st Qu.: 0.0192   1st Qu.:-0.16730   1st Qu.:-0.13584  
##  Median : 0.27455   Median : 0.1766   Median : 0.09514   Median :-0.05036  
##  Mean   : 0.10650   Mean   : 0.2040   Mean   : 0.12183   Mean   :-0.02763  
##  3rd Qu.: 0.58677   3rd Qu.: 1.0012   3rd Qu.: 0.69635   3rd Qu.: 0.04075  
##  Max.   : 0.71113   Max.   : 2.7615   Max.   : 0.86895   Max.   : 0.74960  
##       u5c               u6c                 u1i               u2i          
##  Min.   :-0.6096   Min.   :-0.600177   Min.   :-2.2361   Min.   :-1.56358  
##  1st Qu.:-0.2354   1st Qu.:-0.119046   1st Qu.:-0.6386   1st Qu.:-0.68951  
##  Median : 0.1615   Median :-0.007188   Median :-0.2547   Median :-0.09448  
##  Mean   : 0.1655   Mean   : 0.102286   Mean   :-0.1315   Mean   :-0.16867  
##  3rd Qu.: 0.4897   3rd Qu.: 0.295545   3rd Qu.: 0.1644   3rd Qu.: 0.54520  
##  Max.   : 0.9510   Max.   : 1.182506   Max.   : 1.9723   Max.   : 0.66309  
##       u3i               u4i               u5i                u6i          
##  Min.   :-1.4254   Min.   :-1.6222   Min.   :-2.83353   Min.   :-0.71436  
##  1st Qu.:-0.5861   1st Qu.:-0.9336   1st Qu.:-1.18382   1st Qu.:-0.30607  
##  Median : 0.1007   Median :-0.5244   Median :-0.05776   Median :-0.13340  
##  Mean   : 0.3122   Mean   :-0.2627   Mean   :-0.40813   Mean   :-0.02908  
##  3rd Qu.: 1.0056   3rd Qu.: 0.9166   3rd Qu.: 0.50680   3rd Qu.: 0.28242  
##  Max.   : 2.1307   Max.   : 1.2520   Max.   : 0.96731   Max.   : 0.95958  
##     cloglog             prob              event       
##  Min.   :-6.7190   Min.   :0.001207   Min.   :0.0000  
##  1st Qu.:-3.3974   1st Qu.:0.032979   1st Qu.:0.0000  
##  Median :-2.5498   Median :0.075129   Median :0.0000  
##  Mean   :-2.6038   Mean   :0.141855   Mean   :0.1426  
##  3rd Qu.:-1.4892   3rd Qu.:0.201921   3rd Qu.:0.0000  
##  Max.   : 0.5825   Max.   :0.833127   Max.   :1.0000
glimpse(d1)
## Rows: 24,000
## Columns: 28
## $ subj      <chr> "subj01", "subj01", "subj01", "subj01", "subj01", "subj01", …
## $ rep       <chr> "rep001", "rep001", "rep001", "rep001", "rep001", "rep001", …
## $ condition <fct> cond1, cond1, cond1, cond1, cond1, cond1, cond2, cond2, cond…
## $ cond      <dbl> 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, …
## $ bin       <int> 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, …
## $ con       <dbl> 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, …
## $ inc       <dbl> 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, …
## $ bin1      <dbl> 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, …
## $ bin2      <dbl> 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, …
## $ bin3      <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, …
## $ bin4      <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, …
## $ bin5      <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, …
## $ bin6      <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ u1c       <dbl> 0.3612864, 0.3612864, 0.3612864, 0.3612864, 0.3612864, 0.361…
## $ u2c       <dbl> 2.761482, 2.761482, 2.761482, 2.761482, 2.761482, 2.761482, …
## $ u3c       <dbl> 0.06367222, 0.06367222, 0.06367222, 0.06367222, 0.06367222, …
## $ u4c       <dbl> -0.1140557, -0.1140557, -0.1140557, -0.1140557, -0.1140557, …
## $ u5c       <dbl> -0.4909776, -0.4909776, -0.4909776, -0.4909776, -0.4909776, …
## $ u6c       <dbl> -0.6001775, -0.6001775, -0.6001775, -0.6001775, -0.6001775, …
## $ u1i       <dbl> -0.1198404, -0.1198404, -0.1198404, -0.1198404, -0.1198404, …
## $ u2i       <dbl> -0.04330945, -0.04330945, -0.04330945, -0.04330945, -0.04330…
## $ u3i       <dbl> 2.130691, 2.130691, 2.130691, 2.130691, 2.130691, 2.130691, …
## $ u4i       <dbl> -0.9336088, -0.9336088, -0.9336088, -0.9336088, -0.9336088, …
## $ u5i       <dbl> 0.5067988, 0.5067988, 0.5067988, 0.5067988, 0.5067988, 0.506…
## $ u6i       <dbl> 0.1550822, 0.1550822, 0.1550822, 0.1550822, 0.1550822, 0.155…
## $ cloglog   <dbl> -4.2987136, -0.4785179, -2.2563278, -1.6040557, -1.7009776, …
## $ prob      <dbl> 0.01349415, 0.46189479, 0.09943631, 0.18215247, 0.16681954, …
## $ event     <int> 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, …
## save initial data
# write_csv(d1, "Tutorial_4_planning/data/d1.csv") #

summarise the data produced so far

d1_pid <- d1 %>%
  group_by(subj, condition, bin) %>%
  summarise(
    across(cloglog:event, \(x) mean(x, na.rm = TRUE)),
    .groups = "drop"
    )
d1_pid  
## # A tibble: 120 × 6
##    subj   condition   bin cloglog   prob event
##    <chr>  <fct>     <int>   <dbl>  <dbl> <dbl>
##  1 subj01 cond1         1  -4.30  0.0135 0    
##  2 subj01 cond1         2  -0.479 0.462  0.49 
##  3 subj01 cond1         3  -2.26  0.0994 0.065
##  4 subj01 cond1         4  -1.60  0.182  0.155
##  5 subj01 cond1         5  -1.70  0.167  0.14 
##  6 subj01 cond1         6  -1.20  0.260  0.285
##  7 subj01 cond2         1  -4.49  0.0112 0.015
##  8 subj01 cond2         2  -3.25  0.0379 0.035
##  9 subj01 cond2         3  -0.239 0.545  0.525
## 10 subj01 cond2         4  -3.20  0.0398 0.035
## # ℹ 110 more rows
d1_group <- d1 %>%
  group_by(condition, bin) %>%
  summarise(
    across(cloglog:event, \(x) mean(x, na.rm = TRUE)),
    .groups = "drop"
    )
d1_group
## # A tibble: 12 × 5
##    condition   bin cloglog   prob  event
##    <fct>     <int>   <dbl>  <dbl>  <dbl>
##  1 cond1         1  -4.55  0.0121 0.01  
##  2 cond1         2  -3.04  0.0959 0.0995
##  3 cond1         3  -2.20  0.118  0.118 
##  4 cond1         4  -1.52  0.204  0.194 
##  5 cond1         5  -1.04  0.316  0.306 
##  6 cond1         6  -0.498 0.465  0.490 
##  7 cond2         1  -4.50  0.0198 0.021 
##  8 cond2         2  -3.38  0.0408 0.0435
##  9 cond2         3  -2.06  0.182  0.174 
## 10 cond2         4  -2.53  0.110  0.108 
## 11 cond2         5  -3.45  0.0499 0.0515
## 12 cond2         6  -2.48  0.0887 0.096

a quick plot

cloglog

p5.1 <- ggplot(d1_group, aes(x=bin, y = cloglog, colour = condition)) +
   geom_line(aes(group = condition)) + 
   geom_point() +
   geom_jitter(data=d1_pid, alpha = 0.5, width = 0.1, height = 0) +
   # scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) +
   scale_x_continuous(breaks = 1:bin_n, labels = 1:bin_n) +
   scale_fill_brewer(palette = "Dark2") +
   scale_colour_brewer(palette = "Dark2") +
   labs(title = "cloglog - simulation d1 (N=10)",
        x = "time bins")
p5.1

event prob

p5.2 <- ggplot(d1_group, aes(x=bin, y = event, colour = condition)) +
   geom_line(aes(group = condition, linetype=condition)) + 
   geom_point() +
   geom_jitter(data=d1_pid, alpha = 0.5, width = 0.1, height = 0) +
   scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) +
   scale_x_continuous(breaks = 1:bin_n, labels = 1:bin_n) +
   scale_fill_brewer(palette = "Dark2") +
   scale_colour_brewer(palette = "Dark2") +
   labs(title = "event prob - simulation d1 (N=10)",
        x = "time bins")
p5.2

convert the data into outcome data (this next chunk can be incorporated into the data structure chunk above, I’m just doing separately here, so that you can see what is initially generated)

d1_out <- d1 %>% 
  group_by(subj, rep, condition) %>%
  mutate(cumsum = cumsum(event),
         outcomel = event==1 & cumsum(event) < 2,
         outcomen = if_else(event==1 & cumsum(event) == 1, 1,
                     if_else(cumsum(event) >= 1, NA, 0))) %>% 
  drop_na(outcomen) %>% 
  ungroup()
head(d1_out)
## # A tibble: 6 × 31
##   subj   rep    condition  cond   bin   con   inc  bin1  bin2  bin3  bin4  bin5
##   <chr>  <chr>  <fct>     <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 subj01 rep001 cond1         0     1     1     0     1     0     0     0     0
## 2 subj01 rep001 cond1         0     2     1     0     0     1     0     0     0
## 3 subj01 rep001 cond2         1     1     0     1     1     0     0     0     0
## 4 subj01 rep001 cond2         1     2     0     1     0     1     0     0     0
## 5 subj01 rep001 cond2         1     3     0     1     0     0     1     0     0
## 6 subj01 rep002 cond1         0     1     1     0     1     0     0     0     0
## # ℹ 19 more variables: bin6 <dbl>, u1c <dbl>, u2c <dbl>, u3c <dbl>, u4c <dbl>,
## #   u5c <dbl>, u6c <dbl>, u1i <dbl>, u2i <dbl>, u3i <dbl>, u4i <dbl>,
## #   u5i <dbl>, u6i <dbl>, cloglog <dbl>, prob <dbl>, event <int>, cumsum <int>,
## #   outcomel <lgl>, outcomen <dbl>
## save outcome data
# write_csv(d1_out, "Tutorial_4_planning/data/d1_out.csv")

summarise the outcome data, which is produced based on the prob value, right?

d1_out_pid <- d1_out %>% 
  group_by(subj, condition, bin) %>% 
  summarise(outcome = mean(outcomen, na.rm = TRUE)) %>% 
  mutate(prime = if_else(condition == "cond1", "con", "inc"),
         prime = factor(prime))
## `summarise()` has grouped output by 'subj', 'condition'. You can override using
## the `.groups` argument.
d1_out_pid
## # A tibble: 120 × 5
## # Groups:   subj, condition [20]
##    subj   condition   bin outcome prime
##    <chr>  <fct>     <int>   <dbl> <fct>
##  1 subj01 cond1         1  0      con  
##  2 subj01 cond1         2  0.49   con  
##  3 subj01 cond1         3  0.0490 con  
##  4 subj01 cond1         4  0.175  con  
##  5 subj01 cond1         5  0.0875 con  
##  6 subj01 cond1         6  0.288  con  
##  7 subj01 cond2         1  0.0150 inc  
##  8 subj01 cond2         2  0.0305 inc  
##  9 subj01 cond2         3  0.545  inc  
## 10 subj01 cond2         4  0.0345 inc  
## # ℹ 110 more rows
d1_out_group <- d1_out %>% 
  group_by(condition, bin) %>% 
  summarise(outcome = mean(outcomen, na.rm = TRUE)) %>% 
  mutate(prime = if_else(condition == "cond1", "con", "inc"),
         prime = factor(prime))
## `summarise()` has grouped output by 'condition'. You can override using the
## `.groups` argument.
d1_out_group
## # A tibble: 12 × 4
## # Groups:   condition [2]
##    condition   bin outcome prime
##    <fct>     <int>   <dbl> <fct>
##  1 cond1         1  0.0100 con  
##  2 cond1         2  0.0995 con  
##  3 cond1         3  0.121  con  
##  4 cond1         4  0.195  con  
##  5 cond1         5  0.322  con  
##  6 cond1         6  0.5    con  
##  7 cond2         1  0.0210 inc  
##  8 cond2         2  0.0429 inc  
##  9 cond2         3  0.178  inc  
## 10 cond2         4  0.108  inc  
## 11 cond2         5  0.0473 inc  
## 12 cond2         6  0.0963 inc

plot outcome using the prime variable to make it comparable to the raw data

p5.3 <- ggplot(d1_out_group, aes(x = bin, y = outcome, colour = prime)) +
  geom_line(aes(group = prime, linetype = prime)) + 
  geom_point() +
  geom_jitter(data=d1_out_pid, alpha=0.5, width = 0.1, height = 0) +
  scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) +
  scale_x_continuous(breaks = 1:bin_n, labels = 1:bin_n) +
  scale_colour_brewer(palette = "Dark2") +
  labs(title = "simulation d1 (N=10)",
       x = "time bins",
       y = "outcome")
p5.3

compare the raw data to the simulated data for a single simulated dataset

p5.4 <- (p3.1 / p5.3) +
   plot_layout(guides = 'collect',
               axes = 'collect')
p5.4

ggsave ("Tutorial_4_planning/figures/raw_vs_d1.jpeg",
        width = 8, height = 8, dpi=800)

ok, this looks reasonably sensible.

6. Think about effect sizes

take a look at avg hazards and hazard ratios at the group level in the original data

p3.1

this is the descriptive data per bin and condition for the hazard - probability of a response/event, given that it has not happened yet

this is the group level summary data plotted in dat_plot

data_group
## # A tibble: 12 × 3
## # Groups:   timebin [6]
##    timebin prime outcome
##    <fct>   <fct>   <dbl>
##  1 4       con   0.00975
##  2 4       inc   0.0139 
##  3 5       con   0.0478 
##  4 5       inc   0.0423 
##  5 6       con   0.0901 
##  6 6       inc   0.0913 
##  7 7       con   0.188  
##  8 7       inc   0.115  
##  9 8       con   0.252  
## 10 8       inc   0.0495 
## 11 9       con   0.401  
## 12 9       inc   0.0848

calculate the hazard ratio

hazard <- data_group %>% 
  pivot_wider(names_from = "prime",
              values_from = "outcome") %>% 
  mutate(diff = con-inc,
         ratio = inc/con)
hazard
## # A tibble: 6 × 5
## # Groups:   timebin [6]
##   timebin     con    inc     diff ratio
##   <fct>     <dbl>  <dbl>    <dbl> <dbl>
## 1 4       0.00975 0.0139 -0.00416 1.43 
## 2 5       0.0478  0.0423  0.00551 0.885
## 3 6       0.0901  0.0913 -0.00121 1.01 
## 4 7       0.188   0.115   0.0732  0.611
## 5 8       0.252   0.0495  0.203   0.196
## 6 9       0.401   0.0848  0.316   0.211
# timebin     con    inc     diff ratio
#   <fct>     <dbl>  <dbl>    <dbl> <dbl>
# 1 4       0.00975 0.0139 -0.00416 1.43 
# 2 5       0.0478  0.0423  0.00551 0.885
# 3 6       0.0901  0.0913 -0.00121 1.01 
# 4 7       0.188   0.115   0.0732  0.611
# 5 8       0.252   0.0495  0.203   0.196
# 6 9       0.401   0.0848  0.316   0.211

key hazard and hazard ratio values of interest are as follows….

## bin4
## 0.61 - 39% reduction in hazard from con to inc
## 0.115/0.188

## bin5
## 0.20 - 80% reduction in hazard from con to inc
## 0.0495/0.252

## bin6
## 0.21 - 79% reduction in hazard from con to inc
## 0.401/0.0848

Ok, if we wanted to simulate a bunch of effect sizes, we could think about the following.

Let’s just focus on bin 6 to keep things simple. And keep in mind that we do need to think about the absolute hazard values, as well as the ratio.

So con is close to 0.1, so let’s fix it there as a round number and calculate some deviations from it.

Given that the effect of interest was ~80% reduction in Sven’s 2016 paper (Panis & Schmidt, 2016), let’s play around with a range of effect sizes from e.g., 0.25, 0.5, 0.75. These would translate to a 75% reduction, 50% reduction and 25% reduction from con to inc.

bin6 <- tibble(
  effect_id = 1:3,
  inc = rep(0.1, times=3),
  ratio = c(0.25, 0.5, 0.75),
  con = inc/ratio,
  inc_clog = clogloglink(inc),
  con_clog = clogloglink(con)
)
bin6
## # A tibble: 3 × 6
##   effect_id   inc ratio   con inc_clog con_clog
##       <int> <dbl> <dbl> <dbl>    <dbl>    <dbl>
## 1         1   0.1  0.25 0.4      -2.25   -0.672
## 2         2   0.1  0.5  0.2      -2.25   -1.50 
## 3         3   0.1  0.75 0.133    -2.25   -1.94
#  effect_id   inc ratio   con inc_clog con_clog
#       <int> <dbl> <dbl> <dbl>    <dbl>    <dbl>
# 1         1   0.1  0.25 0.4      -2.25   -0.672
# 2         2   0.1  0.5  0.2      -2.25   -1.50 
# 3         3   0.1  0.75 0.133    -2.25   -1.94 

7. Create a function to iterate through and simulate many datatsets

create a function to simulate data These values are from the initial model and data from Sven’s paper, which we created above in section 4, but they can be modified when running the function, which we do below.

index_sim <- function(subj_n = 10, rep_n = 200, bin_n = 6,  # these can be changed when calling the function
                b1c = -4.66, b2c = -3.24, b3c = -2.32, b4c = -1.49, b5c = -1.21, b6c = -0.6, 
                b1i = -4.37, b2i = -3.21, b3i = -2.37, b4i = -2.27, b5i = -3.04, b6i = -2.45, # fixed effects
                u1c_sd = 1.06, u2c_sd = 1.34, u3c_sd = 0.60, u4c_sd = 0.47, u5c_sd = 0.58, u6c_sd = 0.72, 
                u1i_sd = 1.17, u2i_sd = 1.05, u3i_sd = 0.91, u4i_sd = 1.47, u5i_sd = 1.34, u6i_sd = 0.62, # varying effects
                cors = cor_mat,   # correlations between effects
                ... # helps the function work with pmap() below
                ) {
  # set up data structure
  data <- add_random(subj = subj_n, rep = rep_n) %>%
  add_within("subj", condition = c("cond1", "cond2")) %>%
  add_contrast("condition", "treatment", add_cols = TRUE, 
               colnames = c("cond")) %>%
  add_within("rep", bin = 1:bin_n) %>%
  ### create new conditions here that respect the twelve levels b1c, b1i etc.
  mutate(con = if_else(condition == "cond1", 1, 0),
         inc = if_else(condition == "cond2", 1, 0)) %>% 
  mutate(bin1 = if_else(bin == 1, 1, 0),
         bin2 = if_else(bin == 2, 1, 0),
         bin3 = if_else(bin == 3, 1, 0),
         bin4 = if_else(bin == 4, 1, 0),
         bin5 = if_else(bin == 5, 1, 0),
         bin6 = if_else(bin == 6, 1, 0)) %>% 
  # add random effects 
  add_ranef("subj", u1c = u1c_sd, u2c = u2c_sd, u3c = u3c_sd, u4c = u4c_sd,
            u5c = u5c_sd, u6c = u6c_sd,
            u1i = u1i_sd, u2i = u2i_sd, u3i = u3i_sd, u4i = u4i_sd,
            u5i = u5i_sd, u6i = u6i_sd, 
            .cors = cor_mat) %>% 
  # calculate logit
  mutate(cloglog = (b1c + u1c) * con * bin1 + (b2c + u2c) * con * bin2 + 
           (b3c + u3c) * con * bin3 + (b4c + u4c) * con * bin4  + 
           (b5c + u5c) * con * bin5 + (b6c + u6c) * con * bin6 +
           (b1i + u1i) * inc * bin1 + (b2i + u2i) * inc * bin2 + 
           (b3i + u3i) * inc * bin3 + (b4i + u4i) * inc * bin4  + 
           (b5i + u5i) * inc * bin5 + (b6i + u6i) * inc * bin6) %>% 
  # calculate inverse cloglog
  mutate(prob = clogloglink(cloglog, inverse = TRUE)) %>%  
  # calculate event
  mutate(event = rbinom(n(), 1, prob)) %>% 
  group_by(subj, rep, condition) %>%
  mutate(cumsum = cumsum(event),
         outcomel = event==1 & cumsum(event) < 2,
         outcomen = if_else(event==1 & cumsum(event) == 1, 1,
                     if_else(cumsum(event) >= 1, NA, 0))) %>% 
  drop_na(outcomen) %>%
  ungroup()

  # glimpse(data) # only use this when testing the code
}

Here’s a quick example of how our function works. You can change these parameters and create some different data.

test_sim <- index_sim(subj_n = 10, rep_n = 200) # if you uncomment glimpse above,
# it will let you glimpse the data that's generated. this is useful for checking / testing code purposes.

create a function to summarise the simulated data

index_summary <- function(df) {
  df %>% 
  group_by(subj, condition, bin) %>% 
  summarise(n = n(), 
            sum = sum(outcomen),
            mean_outcome = mean(outcomen, na.rm = TRUE),
            sd_outcome = sd(outcomen, na.rm = TRUE),
            sem = (sd_outcome/sqrt(length(unique(rep)))),
            .groups="drop") 
}

test it

test_summary <- index_summary(test_sim)
head(test_summary)
## # A tibble: 6 × 8
##   subj   condition   bin     n   sum mean_outcome sd_outcome     sem
##   <chr>  <fct>     <int> <int> <dbl>        <dbl>      <dbl>   <dbl>
## 1 subj01 cond1         1   200     2       0.0100     0.0997 0.00705
## 2 subj01 cond1         2   198     4       0.0202     0.141  0.0100 
## 3 subj01 cond1         3   194    14       0.0722     0.259  0.0186 
## 4 subj01 cond1         4   180    30       0.167      0.374  0.0279 
## 5 subj01 cond1         5   150    45       0.3        0.460  0.0375 
## 6 subj01 cond1         6   105    41       0.390      0.490  0.0478

8. Simulate data for a range of parameters

vary trial count per condition and effect size, keep pid fixed at N=10. This will take some time, probably hours…go get a beer/coffee. For the purposes of this tutorial, eval is set to false in the below chunks, to save time.

plan(multicore)
x1 <- crossing(
  exp = 1:1000, # number of experiment replicates
  subj_n = 10, # range of subject N
  rep_n = c(100, 200, 400),
  b6i = -2.25,
  b6c = c(-1.94, -1.50, -0.67)
) %>%
  mutate(d = pmap(., index_sim)) %>% 
  mutate(s = map(d, index_summary)) %>% 
  select(-d)

9. Summarise the simulated data

sx1 <- x1 %>%
  unnest(s) %>%
  mutate(exp=factor(exp),
         subj_n=factor(subj_n),
         rep_n=factor(rep_n),
         b6c = factor(b6c,
                      levels = c("-1.94", "-1.5", "-0.67"),
                      labels = c("25%", "50%", "75%")),
         subj=factor(subj),
         bin=factor(bin))
sx1

take a look

head(sx1)
tail(sx1)

sx1 %>% 
  distinct(b6c)

save out a file

## save the first simulation
write_csv(sx1, "Tutorial_4_planning/data/sim1/sim1_data.csv")

calculate summary data

read in the file, if it is already computed.

If necessary, read it from the OSF to start with (because it is a large file). This step is only required to be done once. After that, the file will be in your local directory and you can skip this chunk in the future.

## create a relevant folder, if it does not already exist
sim1_output_dir <- here("Tutorial_4_planning/data/sim1/")

if (!dir.exists(here(sim1_output_dir))) {
  dir.create(here(sim1_output_dir))
}

## read in the data from the OSF and store it in a relevant folder, if it does not already exist
sim1_file_name <- here("Tutorial_4_planning/data/sim1/sim1_data.csv")

if (!file.exists(here(sim1_file_name))) {
  osf_retrieve_node("3dbcs") %>% 
  osf_ls_files(pattern = "Tutorial_4_planning") %>%
  osf_ls_files(pattern = "data") %>%
  osf_ls_files(pattern = "sim1") %>%
  osf_ls_files(pattern = "sim1_data") %>%
  osf_download(path = sim1_output_dir)
}

read the file from a local folder

sx1 <- read_csv("Tutorial_4_planning/data/sim1/sim1_data.csv") %>%
  mutate(exp=factor(exp),
         subj_n=factor(subj_n),
         rep_n=factor(rep_n),
         b6c = factor(b6c,
                      levels = c("-1.94", "-1.5", "-0.67"),
                      labels = c("25%", "50%", "75%")),
         subj=factor(subj),
         bin=factor(bin))
## Rows: 1079143 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): subj, condition
## dbl (11): exp, subj_n, rep_n, b6i, b6c, bin, n, sum, mean_outcome, sd_outcom...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sx1
## # A tibble: 1,079,143 × 13
##    exp   subj_n rep_n   b6i b6c   subj  condition bin       n   sum mean_outcome
##    <fct> <fct>  <fct> <dbl> <fct> <fct> <chr>     <fct> <dbl> <dbl>        <dbl>
##  1 1     10     100   -2.25 25%   subj… cond1     1       100     3       0.03  
##  2 1     10     100   -2.25 25%   subj… cond1     2        97    14       0.144 
##  3 1     10     100   -2.25 25%   subj… cond1     3        83    31       0.373 
##  4 1     10     100   -2.25 25%   subj… cond1     4        52     7       0.135 
##  5 1     10     100   -2.25 25%   subj… cond1     5        45     8       0.178 
##  6 1     10     100   -2.25 25%   subj… cond1     6        37     7       0.189 
##  7 1     10     100   -2.25 25%   subj… cond2     1       100    12       0.12  
##  8 1     10     100   -2.25 25%   subj… cond2     2        88     2       0.0227
##  9 1     10     100   -2.25 25%   subj… cond2     3        86     8       0.0930
## 10 1     10     100   -2.25 25%   subj… cond2     4        78    15       0.192 
## # ℹ 1,079,133 more rows
## # ℹ 2 more variables: sd_outcome <dbl>, sem <dbl>

at the exp level

sx1_exp <- sx1 %>% 
  group_by(exp, subj_n, rep_n, b6c, condition, bin) %>% 
  summarise(exp_mean_outcome = mean(mean_outcome, na.rm = TRUE),
            exp_mean_sum=mean(sum, na.rm = TRUE),
            exp_sum=sum(sum),
            n = length(unique(subj)), 
            sd=sd(mean_outcome, na.rm = TRUE),
            sem = (sd/sqrt(length(unique(subj)))),
            .groups = "drop")
sx1_exp
## # A tibble: 108,000 × 12
##    exp   subj_n rep_n b6c   condition bin   exp_mean_outcome exp_mean_sum
##    <fct> <fct>  <fct> <fct> <chr>     <fct>            <dbl>        <dbl>
##  1 1     10     100   25%   cond1     1               0.013           1.3
##  2 1     10     100   25%   cond1     2               0.0513          5  
##  3 1     10     100   25%   cond1     3               0.174          15.9
##  4 1     10     100   25%   cond1     4               0.207          16.4
##  5 1     10     100   25%   cond1     5               0.249          15.4
##  6 1     10     100   25%   cond1     6               0.146           5.7
##  7 1     10     100   25%   cond2     1               0.023           2.3
##  8 1     10     100   25%   cond2     2               0.0215          2.1
##  9 1     10     100   25%   cond2     3               0.0988          9.5
## 10 1     10     100   25%   cond2     4               0.0876          7.5
## # ℹ 107,990 more rows
## # ℹ 4 more variables: exp_sum <dbl>, n <int>, sd <dbl>, sem <dbl>

at the sim level

sx1_sim <- sx1_exp %>% 
  group_by(subj_n, rep_n, b6c, condition, bin) %>% 
  summarise(sim_mean_outcome = mean(exp_mean_outcome, na.rm = TRUE),
            sim_mean_sum = mean(exp_mean_sum, na.rm = TRUE),
            sim_sum = mean(exp_sum, na.rm = TRUE),
            n = length(unique(exp)), 
            sd=sd(exp_mean_outcome, na.rm = TRUE),
            sem = (sd/sqrt(length(unique(exp)))),
            .groups = "drop")
sx1_sim
## # A tibble: 108 × 11
##    subj_n rep_n b6c   condition bin   sim_mean_outcome sim_mean_sum sim_sum
##    <fct>  <fct> <fct> <chr>     <fct>            <dbl>        <dbl>   <dbl>
##  1 10     100   25%   cond1     1               0.0161         1.61    16.1
##  2 10     100   25%   cond1     2               0.0797         7.81    78.1
##  3 10     100   25%   cond1     3               0.108          9.67    96.7
##  4 10     100   25%   cond1     4               0.217         17.3    173. 
##  5 10     100   25%   cond1     5               0.282         17.8    178. 
##  6 10     100   25%   cond1     6               0.162          7.18    71.8
##  7 10     100   25%   cond2     1               0.0232         2.32    23.2
##  8 10     100   25%   cond2     2               0.0644         6.27    62.7
##  9 10     100   25%   cond2     3               0.123         11.1    111. 
## 10 10     100   25%   cond2     4               0.184         14.3    143. 
## # ℹ 98 more rows
## # ℹ 3 more variables: n <int>, sd <dbl>, sem <dbl>

a quick plot of the data

p9.1 <- ggplot(sx1_sim, 
                aes(x = bin, y = sim_mean_outcome, 
                    colour = condition)) +
  geom_jitter(data=sx1_exp, aes(y = exp_mean_outcome),
              alpha=0.5, width = 0.1, height = 0) +
  geom_line(aes(group = condition)) + 
  geom_point(size=2, colour = "black") +
  geom_errorbar(aes(ymin = sim_mean_outcome-sem*1.96, ymax = sim_mean_outcome+sem*1.96),
                width=.2, colour = "black") +
  scale_colour_brewer(palette = "Dark2") +
  scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) +
  labs(x="time bin",
       y="outcome") + 
  facet_grid(fct_rev(rep_n)~b6c)
p9.1

ggsave ("Tutorial_4_planning/figures/sim1/sim_violin.jpeg",
        width = 10, height = 8, dpi=800)

ok, look at some values in bin 6

at the exp level

sx1_exp_check <- sx1_exp %>%
  filter(bin %in% c(6)) %>% 
  select(exp, rep_n, b6c, condition, bin, exp_mean_outcome) %>%
  pivot_wider(names_from = "condition",
              values_from = "exp_mean_outcome") %>% 
  group_by(exp, rep_n, b6c, bin) %>% 
  mutate(ratio = cond2/cond1) 
head(sx1_exp_check)
## # A tibble: 6 × 7
## # Groups:   exp, rep_n, b6c, bin [6]
##   exp   rep_n b6c   bin   cond1  cond2 ratio
##   <fct> <fct> <fct> <fct> <dbl>  <dbl> <dbl>
## 1 1     100   25%   6     0.146 0.116  0.797
## 2 1     100   50%   6     0.268 0.118  0.442
## 3 1     100   75%   6     0.454 0.117  0.259
## 4 1     200   25%   6     0.170 0.147  0.862
## 5 1     200   50%   6     0.225 0.106  0.473
## 6 1     200   75%   6     0.375 0.0940 0.250

and at the sim level

sx1_sim_check <- sx1_exp_check %>%
  group_by(rep_n, b6c, bin) %>% 
  summarise(sim_ratio = mean(ratio),
            sim_sd=sd(ratio),
            sim_sem=sim_sd/sqrt(1000),
            .groups = "drop")
head(sx1_sim_check)
## # A tibble: 6 × 6
##   rep_n b6c   bin   sim_ratio sim_sd sim_sem
##   <fct> <fct> <fct>     <dbl>  <dbl>   <dbl>
## 1 100   25%   6         0.756 0.264  0.00834
## 2 100   50%   6         0.527 0.167  0.00529
## 3 100   75%   6         0.282 0.0798 0.00252
## 4 200   25%   6         0.742 0.221  0.00700
## 5 200   50%   6         0.519 0.148  0.00469
## 6 200   75%   6         0.275 0.0724 0.00229

plot ratio values

p9.2 <- ggplot(sx1_sim_check, 
                aes(x = b6c, y = sim_ratio)) +
  geom_point(size=1.5) +
  geom_errorbar(aes(ymin = sim_ratio-sim_sem*1.96, ymax = sim_ratio+sim_sem*1.96)) + 
  geom_hline(yintercept = c(0.25, 0.5, 0.75), colour = "red") +
  scale_colour_brewer(palette = "Dark2") +
  scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) +
  labs(x="time bin",
       y="outcome") + 
  facet_wrap(~rep_n)
p9.2

10. calculate power / precision

basic idea - calculate the difference score in hazard between con and inc per bin and then report how many lower bound 95% quantile intervals exclude zero? This of course is not a model-based estimate. But we think it should give a reasonable guide to what we might expect.

calculate difference scores

at the exp level

sx1_diff_exp <- sx1 %>%
  pivot_wider(id_cols = c(exp, subj_n, rep_n, b6c, subj, bin),
              names_from = "condition",
              values_from = "mean_outcome") %>% 
  mutate(diff = cond1 - cond2) %>%
  group_by(exp, subj_n, rep_n, b6c, bin) %>% 
  summarise(exp_mean_diff = mean(diff, na.rm = TRUE),
            exp_sd = sd(diff, na.rm = TRUE),
            n = n(), # n here is the total trials per condition per pid
            exp_sem = (exp_sd/sqrt(n)),
            exp_ci = 1.96*exp_sem,
            exp_dz = exp_mean_diff/exp_sd,
            .groups = "drop") 
head(sx1_diff_exp)
## # A tibble: 6 × 11
##   exp   subj_n rep_n b6c   bin   exp_mean_diff exp_sd     n exp_sem exp_ci
##   <fct> <fct>  <fct> <fct> <fct>         <dbl>  <dbl> <int>   <dbl>  <dbl>
## 1 1     10     100   25%   1           -0.01   0.0406    10  0.0128 0.0251
## 2 1     10     100   25%   2            0.0297 0.0504    10  0.0159 0.0312
## 3 1     10     100   25%   3            0.0755 0.152     10  0.0482 0.0944
## 4 1     10     100   25%   4            0.120  0.147     10  0.0466 0.0914
## 5 1     10     100   25%   5            0.0673 0.262     10  0.0828 0.162 
## 6 1     10     100   25%   6            0.0296 0.124     10  0.0391 0.0767
## # ℹ 1 more variable: exp_dz <dbl>
tail(sx1_diff_exp)
## # A tibble: 6 × 11
##   exp   subj_n rep_n b6c   bin   exp_mean_diff exp_sd     n exp_sem exp_ci
##   <fct> <fct>  <fct> <fct> <fct>         <dbl>  <dbl> <int>   <dbl>  <dbl>
## 1 1000  10     400   75%   1           0.00475 0.0254    10 0.00803 0.0157
## 2 1000  10     400   75%   2          -0.0171  0.119     10 0.0377  0.0739
## 3 1000  10     400   75%   3          -0.0714  0.131     10 0.0413  0.0809
## 4 1000  10     400   75%   4           0.0297  0.200     10 0.0632  0.124 
## 5 1000  10     400   75%   5           0.103   0.248     10 0.0786  0.154 
## 6 1000  10     400   75%   6           0.334   0.187     10 0.0592  0.116 
## # ℹ 1 more variable: exp_dz <dbl>

at the group/sim level

sx1_diff_sim <- sx1_diff_exp %>% 
  group_by(subj_n, rep_n, b6c, bin) %>% 
  summarise(mean_diff = mean(exp_mean_diff, na.rm = TRUE),
            sd = sd(exp_mean_diff, na.rm = TRUE),
            n=n(),
            sem = (sd/sqrt((n))),
            ci = 1.96*sem) 
## `summarise()` has grouped output by 'subj_n', 'rep_n', 'b6c'. You can override
## using the `.groups` argument.
sx1_diff_sim
## # A tibble: 54 × 9
## # Groups:   subj_n, rep_n, b6c [9]
##    subj_n rep_n b6c   bin   mean_diff     sd     n      sem       ci
##    <fct>  <fct> <fct> <fct>     <dbl>  <dbl> <int>    <dbl>    <dbl>
##  1 10     100   25%   1      -0.00711 0.0136  1000 0.000431 0.000845
##  2 10     100   25%   2       0.0153  0.0435  1000 0.00138  0.00270 
##  3 10     100   25%   3      -0.0145  0.0401  1000 0.00127  0.00249 
##  4 10     100   25%   4       0.0326  0.0721  1000 0.00228  0.00447 
##  5 10     100   25%   5       0.190   0.0588  1000 0.00186  0.00365 
##  6 10     100   25%   6       0.0459  0.0463  1000 0.00146  0.00287 
##  7 10     100   50%   1      -0.00897 0.0152  1000 0.000482 0.000945
##  8 10     100   50%   2       0.0155  0.0395  1000 0.00125  0.00245 
##  9 10     100   50%   3      -0.0133  0.0399  1000 0.00126  0.00247 
## 10 10     100   50%   4       0.0331  0.0744  1000 0.00235  0.00461 
## # ℹ 44 more rows

plot

violin

diff in original units

p10.1 <- ggplot(sx1_diff_exp, aes(x=bin, y = exp_mean_diff,
                                colour = bin, fill = bin)) +
   geom_jitter(alpha = 0.5, width = 0.1) +
   geom_violin(alpha = 0.7) +
   geom_point(data = sx1_diff_sim, 
             aes(y = mean_diff), size = 3, position=pd2, colour="black") +
   geom_errorbar(data = sx1_diff_sim,
                aes(y = mean_diff, ymin = mean_diff-ci, ymax = mean_diff+ci),
                width=.2, position=pd2, colour = "black") +
   geom_hline(yintercept = 0, colour = "black", linetype = "dashed") +
   scale_fill_brewer(palette = "Dark2") +
   scale_colour_brewer(palette = "Dark2") +
   theme(legend.position = "none") +
   ylab("mean outcome") +
   ggtitle("difference score (cond1-cond2)") +
   facet_grid(fct_rev(rep_n)~b6c)
p10.1

ggsave("Tutorial_4_planning/figures/sim1/sim_diffs.jpeg",
       width = 10, height = 8, dpi=800)

plot each exp’s difference score and associated 95% interval

create some factors and new variables

sx1_diff_exp_2 <- sx1_diff_exp %>%
  group_by(exp, subj_n, rep_n, b6c, bin) %>% 
  mutate(lower = exp_mean_diff-exp_ci,
         upper = exp_mean_diff+exp_ci,
         above_zero = if_else(lower > 0, "yes", "no"), 
         above_zero = factor(above_zero, levels = c("no", "yes")))
sx1_diff_exp_2
## # A tibble: 54,000 × 14
## # Groups:   exp, subj_n, rep_n, b6c, bin [54,000]
##    exp   subj_n rep_n b6c   bin   exp_mean_diff exp_sd     n exp_sem exp_ci
##    <fct> <fct>  <fct> <fct> <fct>         <dbl>  <dbl> <int>   <dbl>  <dbl>
##  1 1     10     100   25%   1           -0.01   0.0406    10 0.0128  0.0251
##  2 1     10     100   25%   2            0.0297 0.0504    10 0.0159  0.0312
##  3 1     10     100   25%   3            0.0755 0.152     10 0.0482  0.0944
##  4 1     10     100   25%   4            0.120  0.147     10 0.0466  0.0914
##  5 1     10     100   25%   5            0.0673 0.262     10 0.0828  0.162 
##  6 1     10     100   25%   6            0.0296 0.124     10 0.0391  0.0767
##  7 1     10     100   50%   1            0.008  0.0181    10 0.00573 0.0112
##  8 1     10     100   50%   2           -0.0375 0.0487    10 0.0154  0.0302
##  9 1     10     100   50%   3           -0.0220 0.116     10 0.0367  0.0720
## 10 1     10     100   50%   4           -0.115  0.336     10 0.106   0.209 
## # ℹ 53,990 more rows
## # ℹ 4 more variables: exp_dz <dbl>, lower <dbl>, upper <dbl>, above_zero <fct>

calculate “power” in a quick and dirty way based on 95% CI

power_x1 <- sx1_diff_exp_2 %>%
  group_by(subj_n, rep_n, b6c, bin) %>%
  mutate(check = ifelse(lower > 0, 1, 0)) %>%
  summarise(power = mean(check, na.rm = TRUE)) %>% 
  filter(bin %in% c(6))
## `summarise()` has grouped output by 'subj_n', 'rep_n', 'b6c'. You can override
## using the `.groups` argument.
power_x1
## # A tibble: 9 × 5
## # Groups:   subj_n, rep_n, b6c [9]
##   subj_n rep_n b6c   bin   power
##   <fct>  <fct> <fct> <fct> <dbl>
## 1 10     100   25%   6     0.202
## 2 10     100   50%   6     0.608
## 3 10     100   75%   6     0.993
## 4 10     200   25%   6     0.21 
## 5 10     200   50%   6     0.675
## 6 10     200   75%   6     0.999
## 7 10     400   25%   6     0.216
## 8 10     400   50%   6     0.693
## 9 10     400   75%   6     0.998

plot power

p10.2 <- ggplot(power_x1, aes(x = b6c, y=rep_n, fill = power)) +
  geom_tile() +
  geom_text(aes(label = sprintf("%.3f", power)), color = "white", size = 10) +
  scale_fill_viridis_c(limits = c(0, 1)) +
  facet_wrap(~bin) +
  labs(y="rep_n", x="b6c")
p10.2

#
ggsave ("Tutorial_4_planning/figures/sim1/power.jpeg",
        width = 10, height = 6, dpi=800)

join power to the df

sx1_diff_power <- sx1_diff_exp_2 %>%
  filter(bin %in% c(6)) %>% 
  inner_join(power_x1, by = c("subj_n", "rep_n", "b6c", "bin")) %>% 
  mutate(power = round(power * 100, 2)) 
head(sx1_diff_power)
## # A tibble: 6 × 15
## # Groups:   exp, subj_n, rep_n, b6c, bin [6]
##   exp   subj_n rep_n b6c   bin   exp_mean_diff exp_sd     n exp_sem exp_ci
##   <fct> <fct>  <fct> <fct> <fct>         <dbl>  <dbl> <int>   <dbl>  <dbl>
## 1 1     10     100   25%   6            0.0296  0.124    10  0.0391 0.0767
## 2 1     10     100   50%   6            0.150   0.136    10  0.0431 0.0846
## 3 1     10     100   75%   6            0.337   0.224    10  0.0710 0.139 
## 4 1     10     200   25%   6            0.0234  0.166    10  0.0526 0.103 
## 5 1     10     200   50%   6            0.118   0.230    10  0.0727 0.143 
## 6 1     10     200   75%   6            0.281   0.259    10  0.0820 0.161 
## # ℹ 5 more variables: exp_dz <dbl>, lower <dbl>, upper <dbl>, above_zero <fct>,
## #   power <dbl>

plot

p10.3 <- sx1_diff_power %>%
  ggplot(aes(x = exp, y = exp_mean_diff, ymin = lower, ymax = upper)) +
  geom_pointrange(fatten = 1/2, aes(colour=above_zero)) +
  geom_hline(yintercept = 0, colour = "red") +
  scale_colour_manual(values=c("darkgrey","black")) +
  geom_text(aes(x=700, y=-0.35, label = sprintf("%.1f%s", power, "% power")), 
            color = "darkgrey", size = 5) +
  theme(legend.position = "none") +
  labs(x = "sim # (i.e., simulation index)",
       y = "hazard difference") +
  scale_x_discrete(breaks = c(250,500,750,1000)) +
  facet_grid(fct_rev(rep_n)~b6c)
p10.3

ggsave ("Tutorial_4_planning/figures/sim1/bin6_diffs.jpeg",
        width = 10, height = 8, dpi=800)

plot power as a bar plot

p10.4 <- ggplot(power_x1, aes(x=rep_n, y=power,
                           colour = b6c, fill = b6c)) +
  geom_col(alpha = 0.5) +
  geom_hline(yintercept = 0.8, colour = "red", linetype = "dashed") +
  geom_hline(yintercept = 0.9, colour = "black", linetype = "dashed") +
  geom_hline(yintercept = 0.95, colour = "blue", linetype = "dashed") +
  scale_colour_brewer(palette = "Dark2") +
  scale_fill_brewer(palette = "Dark2") +
  labs(title = "Simulation 1") +
  theme(legend.position = "none") +
  facet_wrap(~b6c, nrow=1)
p10.4

ggsave ("Tutorial_4_planning/figures/sim1/bin6_power_col.jpeg",
        width = 8, height = 6, dpi=800)

ok, on the basis of this analysis, we might make the following conclusions:

  • effects of con vs inc that are as large as previously reported by Panis & Schimdt (2016) would be easy to detect with 100 trials and 10 pids. And probably less data than that also.

  • Across all effect sizes, there is no benefit of adding more than 200 trials per condition.

  • Within the context of these absolute hazard values, somewhere between a 50% reduction and a 75% reduction would give you 80% power to detect an effect, should it exist. e.g., a 75% reduction is nearly 100% power. A 50% reduction is 70% power. So maybe a 60% reduction would give 80% power, with N=10 and trial =200. We could run more sims to figure that out.

  • Other analyses to run - how many individual pids per exp show a clear effect of con vs inc in bin 6? This would be useful to know if we used the small-N design logic and the individual as the unit of replication.

11. Run a follow-up simulation

Let’s try a 50%, 60% and 70% reduction across N=10, 15 and 20.

let’s check effect sizes quickly.

bin6_v2 <- tibble(
  effect_id = 1:3,
  inc = rep(0.1, times=3),
  ratio = c(0.3, 0.4, 0.5),
  con = inc/ratio,
  inc_clog = clogloglink(inc),
  con_clog = clogloglink(con)
)
bin6_v2
## # A tibble: 3 × 6
##   effect_id   inc ratio   con inc_clog con_clog
##       <int> <dbl> <dbl> <dbl>    <dbl>    <dbl>
## 1         1   0.1   0.3 0.333    -2.25   -0.903
## 2         2   0.1   0.4 0.25     -2.25   -1.25 
## 3         3   0.1   0.5 0.2      -2.25   -1.50
# effect_id   inc ratio   con inc_clog con_clog
#       <int> <dbl> <dbl> <dbl>    <dbl>    <dbl>
# 1         1   0.1   0.3 0.333    -2.25   -0.903
# 2         2   0.1   0.4 0.25     -2.25   -1.25 
# 3         3   0.1   0.5 0.2      -2.25   -1.50 

run the sim

vary effect size and pid. Again, as before, this will take some time, probably hours…go get another beer/coffee. And again, eval is set to FALSE, to save time for the tutorial.

plan(multicore)
x2 <- crossing(
  exp = 1:1000, # number of experiment replicates
  subj_n = c(10,15,20), # range of subject N
  rep_n = 200,
  b6i = -2.25,
  b6c = c(-0.90, -1.25, -1.50)
) %>%
  mutate(d = pmap(., index_sim)) %>% 
  mutate(s = map(d, index_summary)) %>% 
  select(-d)

summarise the simulated data

sx2 <- x2 %>%
  unnest(s) %>%
  mutate(exp=factor(exp),
         subj_n=factor(subj_n),
         rep_n=factor(rep_n),
         b6c = factor(b6c,
                      levels = c("-1.5", "-1.25", "-0.9"),
                      labels = c("50%", "60%", "70%")),
         subj=factor(subj),
         bin=factor(bin))
sx2

take a look

head(sx2)
tail(sx2)

sx2 %>% 
  distinct(b6c)

save out a file

## save the first simulation
write_csv(sx2, "Tutorial_4_planning/data/sim2/sim2_data.csv")

calculate summary data

read in the file, if it is already computed.

If necessary, read it from the OSF to start with (because it is a large file). This step is only required to be done once. After that, the file will be in your local directory and you can skip this chunk in the future. To do so, set eval=TRUE in the chunk below, then re-set to FALSE once the data have been downloaded.

## create a relevant folder, if it does not already exist
sim2_output_dir <- here("Tutorial_4_planning/data/sim2/")

if (!dir.exists(here(sim2_output_dir))) {
  dir.create(here(sim2_output_dir))
}

## read in the data from the OSF and store it in a relevant folder, if it does not already exist
sim2_file_name <- here("Tutorial_4_planning/data/sim2/sim2_data.csv")

if (!file.exists(here(sim2_file_name))) {
  osf_retrieve_node("3dbcs") %>% 
  osf_ls_files(pattern = "Tutorial_4_planning") %>%
  osf_ls_files(pattern = "data") %>%
  osf_ls_files(pattern = "sim2") %>%
  osf_ls_files(pattern = "sim2_data") %>%
  osf_download(path = sim2_output_dir)
}

read the file from a local folder

sx2 <- read_csv("Tutorial_4_planning/data/sim2/sim2_data.csv") %>%
  mutate(exp=factor(exp),
         subj_n=factor(subj_n),
         rep_n=factor(rep_n),
         b6c = factor(b6c,
                      levels = c("-1.5", "-1.25", "-0.9"),
                      labels = c("50%", "60%", "70%")),
         subj=factor(subj),
         bin=factor(bin))
## Rows: 1618687 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): subj, condition
## dbl (11): exp, subj_n, rep_n, b6i, b6c, bin, n, sum, mean_outcome, sd_outcom...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sx2
## # A tibble: 1,618,687 × 13
##    exp   subj_n rep_n   b6i b6c   subj  condition bin       n   sum mean_outcome
##    <fct> <fct>  <fct> <dbl> <fct> <fct> <chr>     <fct> <dbl> <dbl>        <dbl>
##  1 1     10     200   -2.25 50%   subj… cond1     1       200     1       0.005 
##  2 1     10     200   -2.25 50%   subj… cond1     2       199     6       0.0302
##  3 1     10     200   -2.25 50%   subj… cond1     3       193     7       0.0363
##  4 1     10     200   -2.25 50%   subj… cond1     4       186    16       0.0860
##  5 1     10     200   -2.25 50%   subj… cond1     5       170    30       0.176 
##  6 1     10     200   -2.25 50%   subj… cond1     6       140    26       0.186 
##  7 1     10     200   -2.25 50%   subj… cond2     1       200     1       0.005 
##  8 1     10     200   -2.25 50%   subj… cond2     2       199     8       0.0402
##  9 1     10     200   -2.25 50%   subj… cond2     3       191    14       0.0733
## 10 1     10     200   -2.25 50%   subj… cond2     4       177     2       0.0113
## # ℹ 1,618,677 more rows
## # ℹ 2 more variables: sd_outcome <dbl>, sem <dbl>

at the exp level

sx2_exp <- sx2 %>% 
  group_by(exp, subj_n, rep_n, b6c, condition, bin) %>% 
  summarise(exp_mean_outcome = mean(mean_outcome, na.rm = TRUE),
            exp_mean_sum=mean(sum, na.rm = TRUE),
            exp_sum=sum(sum),
            n = length(unique(subj)), 
            sd=sd(mean_outcome, na.rm = TRUE),
            sem = (sd/sqrt(length(unique(subj)))),
            .groups = "drop")
sx2_exp
## # A tibble: 108,000 × 12
##    exp   subj_n rep_n b6c   condition bin   exp_mean_outcome exp_mean_sum
##    <fct> <fct>  <fct> <fct> <chr>     <fct>            <dbl>        <dbl>
##  1 1     10     200   50%   cond1     1               0.014           2.8
##  2 1     10     200   50%   cond1     2               0.0449          8.8
##  3 1     10     200   50%   cond1     3               0.0550         10.3
##  4 1     10     200   50%   cond1     4               0.200          35.4
##  5 1     10     200   50%   cond1     5               0.305          43  
##  6 1     10     200   50%   cond1     6               0.306          28.9
##  7 1     10     200   50%   cond2     1               0.013           2.6
##  8 1     10     200   50%   cond2     2               0.0410          8.1
##  9 1     10     200   50%   cond2     3               0.123          22.3
## 10 1     10     200   50%   cond2     4               0.219          31.7
## # ℹ 107,990 more rows
## # ℹ 4 more variables: exp_sum <dbl>, n <int>, sd <dbl>, sem <dbl>

at the sim level

sx2_sim <- sx2_exp %>% 
  group_by(subj_n, rep_n, b6c, condition, bin) %>% 
  summarise(sim_mean_outcome = mean(exp_mean_outcome, na.rm = TRUE),
            sim_mean_sum = mean(exp_mean_sum, na.rm = TRUE),
            sim_sum = mean(exp_sum, na.rm = TRUE),
            n = length(unique(exp)), 
            sd=sd(exp_mean_outcome, na.rm = TRUE),
            sem = (sd/sqrt(length(unique(exp)))),
            .groups = "drop")
sx2_sim
## # A tibble: 108 × 11
##    subj_n rep_n b6c   condition bin   sim_mean_outcome sim_mean_sum sim_sum
##    <fct>  <fct> <fct> <chr>     <fct>            <dbl>        <dbl>   <dbl>
##  1 10     200   50%   cond1     1               0.0162         3.25    32.5
##  2 10     200   50%   cond1     2               0.0776        15.2    152. 
##  3 10     200   50%   cond1     3               0.108         19.4    194. 
##  4 10     200   50%   cond1     4               0.218         35.0    350. 
##  5 10     200   50%   cond1     5               0.285         36.0    360. 
##  6 10     200   50%   cond1     6               0.236         20.9    209. 
##  7 10     200   50%   cond2     1               0.0235         4.70    47.0
##  8 10     200   50%   cond2     2               0.0650        12.6    126. 
##  9 10     200   50%   cond2     3               0.124         22.3    223. 
## 10 10     200   50%   cond2     4               0.186         28.8    288. 
## # ℹ 98 more rows
## # ℹ 3 more variables: n <int>, sd <dbl>, sem <dbl>

a quick plot of the data

p11.1 <- ggplot(sx2_sim, 
                aes(x = bin, y = sim_mean_outcome, 
                    colour = condition)) +
  geom_jitter(data=sx2_exp, aes(y = exp_mean_outcome),
              alpha=0.5, width = 0.1, height = 0) +
  geom_line(aes(group = condition)) + 
  geom_point(size=2, colour = "black") +
  geom_errorbar(aes(ymin = sim_mean_outcome-sem*1.96, ymax = sim_mean_outcome+sem*1.96),
                width=.2, colour = "black") +
  scale_colour_brewer(palette = "Dark2") +
  scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) +
  labs(x="time bin",
       y="outcome") + 
  facet_grid(fct_rev(subj_n)~b6c)
p11.1

ggsave ("Tutorial_4_planning/figures/sim2/sim_violin.jpeg",
        width = 10, height = 8, dpi=800)

ok, look at some values in bin 6

at the exp level

sx2_exp_check <- sx2_exp %>%
  filter(bin %in% c(6)) %>% 
  select(exp, subj_n, b6c, condition, bin, exp_mean_outcome) %>%
  pivot_wider(names_from = "condition",
              values_from = "exp_mean_outcome") %>% 
  group_by(exp, subj_n, b6c, bin) %>% 
  mutate(ratio = cond2/cond1) 
head(sx2_exp_check)
## # A tibble: 6 × 7
## # Groups:   exp, subj_n, b6c, bin [6]
##   exp   subj_n b6c   bin   cond1 cond2 ratio
##   <fct> <fct>  <fct> <fct> <dbl> <dbl> <dbl>
## 1 1     10     50%   6     0.306 0.143 0.466
## 2 1     10     60%   6     0.295 0.119 0.403
## 3 1     10     70%   6     0.399 0.104 0.261
## 4 1     15     50%   6     0.185 0.113 0.613
## 5 1     15     60%   6     0.234 0.129 0.550
## 6 1     15     70%   6     0.327 0.123 0.376

and at the sim level

sx2_sim_check <- sx2_exp_check %>%
  group_by(subj_n, b6c, bin) %>% 
  summarise(sim_ratio = mean(ratio),
            sim_sd=sd(ratio),
            sim_sem=sim_sd/sqrt(1000),
            .groups = "drop")
head(sx2_sim_check)
## # A tibble: 6 × 6
##   subj_n b6c   bin   sim_ratio sim_sd sim_sem
##   <fct>  <fct> <fct>     <dbl>  <dbl>   <dbl>
## 1 10     50%   6         0.517 0.142  0.00450
## 2 10     60%   6         0.429 0.118  0.00373
## 3 10     70%   6         0.328 0.0888 0.00281
## 4 15     50%   6         0.513 0.125  0.00395
## 5 15     60%   6         0.417 0.0968 0.00306
## 6 15     70%   6         0.327 0.0722 0.00228

plot ratio values

p11.2 <- ggplot(sx2_sim_check, 
                aes(x = b6c, y = sim_ratio)) +
  geom_point(size=1.5) +
  geom_errorbar(aes(ymin = sim_ratio-sim_sem*1.96, ymax = sim_ratio+sim_sem*1.96)) + 
  geom_hline(yintercept = c(0.3, 0.4, 0.5), colour = "red") +
  scale_colour_brewer(palette = "Dark2") +
  scale_y_continuous(limits = c(0,1), breaks = seq(0,1,0.25)) +
  labs(x="time bin",
       y="outcome") + 
  facet_wrap(~subj_n)
p11.2

calculate power / precision

Same basic idea as in Simulation 1.

calculate difference scores

at the exp level

sx2_diff_exp <- sx2 %>%
  pivot_wider(id_cols = c(exp, subj_n, rep_n, b6c, subj, bin),
              names_from = "condition",
              values_from = "mean_outcome") %>% 
  mutate(diff = cond1 - cond2) %>%
  group_by(exp, subj_n, rep_n, b6c, bin) %>% 
  summarise(exp_mean_diff = mean(diff, na.rm = TRUE),
            exp_sd = sd(diff, na.rm = TRUE),
            n = n(), # n here is the total trials per condition per pid
            exp_sem = (exp_sd/sqrt(n)),
            exp_ci = 1.96*exp_sem,
            exp_dz = exp_mean_diff/exp_sd,
            .groups = "drop") 
head(sx2_diff_exp)
## # A tibble: 6 × 11
##   exp   subj_n rep_n b6c   bin   exp_mean_diff exp_sd     n exp_sem exp_ci
##   <fct> <fct>  <fct> <fct> <fct>         <dbl>  <dbl> <int>   <dbl>  <dbl>
## 1 1     10     200   50%   1           0.00100 0.0165    10 0.00521 0.0102
## 2 1     10     200   50%   2           0.00386 0.0777    10 0.0246  0.0482
## 3 1     10     200   50%   3          -0.0676  0.108     10 0.0342  0.0671
## 4 1     10     200   50%   4          -0.0189  0.258     10 0.0816  0.160 
## 5 1     10     200   50%   5           0.221   0.184     10 0.0580  0.114 
## 6 1     10     200   50%   6           0.163   0.176     10 0.0558  0.109 
## # ℹ 1 more variable: exp_dz <dbl>
tail(sx2_diff_exp)
## # A tibble: 6 × 11
##   exp   subj_n rep_n b6c   bin   exp_mean_diff exp_sd     n exp_sem exp_ci
##   <fct> <fct>  <fct> <fct> <fct>         <dbl>  <dbl> <int>   <dbl>  <dbl>
## 1 1000  20     200   70%   1          -0.00675 0.0369    20 0.00825 0.0162
## 2 1000  20     200   70%   2          -0.0174  0.0696    20 0.0156  0.0305
## 3 1000  20     200   70%   3           0.0574  0.0611    20 0.0137  0.0268
## 4 1000  20     200   70%   4           0.0291  0.231     20 0.0517  0.101 
## 5 1000  20     200   70%   5           0.180   0.131     20 0.0294  0.0576
## 6 1000  20     200   70%   6           0.277   0.184     20 0.0413  0.0809
## # ℹ 1 more variable: exp_dz <dbl>

at the group/sim level

sx2_diff_sim <- sx2_diff_exp %>% 
  group_by(subj_n, rep_n, b6c, bin) %>% 
  summarise(mean_diff = mean(exp_mean_diff, na.rm = TRUE),
            sd = sd(exp_mean_diff, na.rm = TRUE),
            n=n(),
            sem = (sd/sqrt((n))),
            ci = 1.96*sem) 
## `summarise()` has grouped output by 'subj_n', 'rep_n', 'b6c'. You can override
## using the `.groups` argument.
sx2_diff_sim
## # A tibble: 54 × 9
## # Groups:   subj_n, rep_n, b6c [9]
##    subj_n rep_n b6c   bin   mean_diff     sd     n      sem       ci
##    <fct>  <fct> <fct> <fct>     <dbl>  <dbl> <int>    <dbl>    <dbl>
##  1 10     200   50%   1      -0.00728 0.0125  1000 0.000394 0.000772
##  2 10     200   50%   2       0.0126  0.0408  1000 0.00129  0.00253 
##  3 10     200   50%   3      -0.0157  0.0386  1000 0.00122  0.00239 
##  4 10     200   50%   4       0.0320  0.0734  1000 0.00232  0.00455 
##  5 10     200   50%   5       0.192   0.0572  1000 0.00181  0.00355 
##  6 10     200   50%   6       0.118   0.0496  1000 0.00157  0.00308 
##  7 10     200   60%   1      -0.00780 0.0139  1000 0.000439 0.000861
##  8 10     200   60%   2       0.0138  0.0391  1000 0.00124  0.00242 
##  9 10     200   60%   3      -0.0147  0.0388  1000 0.00123  0.00240 
## 10 10     200   60%   4       0.0335  0.0685  1000 0.00217  0.00424 
## # ℹ 44 more rows

plot

violin

diff in original units

p11.3 <- ggplot(sx2_diff_exp, aes(x=bin, y = exp_mean_diff,
                                colour = bin, fill = bin)) +
   geom_jitter(alpha = 0.5, width = 0.1) +
   geom_violin(alpha = 0.7) +
   geom_point(data = sx2_diff_sim, 
             aes(y = mean_diff), size = 3, position=pd2, colour="black") +
   geom_errorbar(data = sx2_diff_sim,
                aes(y = mean_diff, ymin = mean_diff-ci, ymax = mean_diff+ci),
                width=.2, position=pd2, colour = "black") +
   geom_hline(yintercept = 0, colour = "black", linetype = "dashed") +
   scale_fill_brewer(palette = "Dark2") +
   scale_colour_brewer(palette = "Dark2") +
   theme(legend.position = "none") +
   ylab("mean outcome") +
   ggtitle("difference score (cond1-cond2)") +
   facet_grid(fct_rev(subj_n)~b6c)
p11.3

ggsave("Tutorial_4_planning/figures/sim2/sim_diffs.jpeg",
       width = 10, height = 8, dpi=800)

plot each exp’s difference score and associated 95% interval

create some factors and new variables

sx2_diff_exp_2 <- sx2_diff_exp %>%
  group_by(exp, subj_n, rep_n, b6c, bin) %>% 
  mutate(lower = exp_mean_diff-exp_ci,
         upper = exp_mean_diff+exp_ci,
         above_zero = if_else(lower > 0, "yes", "no"), 
         above_zero = factor(above_zero, levels = c("no", "yes")))
sx2_diff_exp_2
## # A tibble: 54,000 × 14
## # Groups:   exp, subj_n, rep_n, b6c, bin [54,000]
##    exp   subj_n rep_n b6c   bin   exp_mean_diff exp_sd     n exp_sem  exp_ci
##    <fct> <fct>  <fct> <fct> <fct>         <dbl>  <dbl> <int>   <dbl>   <dbl>
##  1 1     10     200   50%   1           0.00100 0.0165    10 0.00521 0.0102 
##  2 1     10     200   50%   2           0.00386 0.0777    10 0.0246  0.0482 
##  3 1     10     200   50%   3          -0.0676  0.108     10 0.0342  0.0671 
##  4 1     10     200   50%   4          -0.0189  0.258     10 0.0816  0.160  
##  5 1     10     200   50%   5           0.221   0.184     10 0.0580  0.114  
##  6 1     10     200   50%   6           0.163   0.176     10 0.0558  0.109  
##  7 1     10     200   60%   1          -0.006   0.0156    10 0.00493 0.00967
##  8 1     10     200   60%   2          -0.0135  0.0560    10 0.0177  0.0347 
##  9 1     10     200   60%   3           0.0214  0.0878    10 0.0278  0.0544 
## 10 1     10     200   60%   4          -0.00657 0.220     10 0.0696  0.136  
## # ℹ 53,990 more rows
## # ℹ 4 more variables: exp_dz <dbl>, lower <dbl>, upper <dbl>, above_zero <fct>

calculate “power” in a quick and dirty way based on 95% CI

power_x2 <- sx2_diff_exp_2 %>%
  group_by(subj_n, rep_n, b6c, bin) %>%
  mutate(check = ifelse(lower > 0, 1, 0)) %>%
  summarise(power = mean(check, na.rm = TRUE)) %>% 
  filter(bin %in% c(6))
## `summarise()` has grouped output by 'subj_n', 'rep_n', 'b6c'. You can override
## using the `.groups` argument.
power_x2
## # A tibble: 9 × 5
## # Groups:   subj_n, rep_n, b6c [9]
##   subj_n rep_n b6c   bin   power
##   <fct>  <fct> <fct> <fct> <dbl>
## 1 10     200   50%   6     0.682
## 2 10     200   60%   6     0.869
## 3 10     200   70%   6     0.983
## 4 15     200   50%   6     0.831
## 5 15     200   60%   6     0.956
## 6 15     200   70%   6     0.997
## 7 20     200   50%   6     0.932
## 8 20     200   60%   6     0.986
## 9 20     200   70%   6     0.999

plot power

p11.4 <- ggplot(power_x2, aes(x = b6c, y=subj_n, fill = power)) +
  geom_tile() +
  geom_text(aes(label = sprintf("%.3f", power)), color = "white", size = 10) +
  scale_fill_viridis_c(limits = c(0, 1)) +
  facet_wrap(~bin) +
  labs(y="subj_n", x="b6c")
p11.4

#
ggsave ("Tutorial_4_planning/figures/sim2/power.jpeg",
        width = 10, height = 6, dpi=800)

join power to the df

sx2_diff_power <- sx2_diff_exp_2 %>%
  filter(bin %in% c(6)) %>% 
  inner_join(power_x2, by = c("subj_n", "rep_n", "b6c", "bin")) %>% 
  mutate(power = round(power * 100, 2)) 
head(sx2_diff_power)
## # A tibble: 6 × 15
## # Groups:   exp, subj_n, rep_n, b6c, bin [6]
##   exp   subj_n rep_n b6c   bin   exp_mean_diff exp_sd     n exp_sem exp_ci
##   <fct> <fct>  <fct> <fct> <fct>         <dbl>  <dbl> <int>   <dbl>  <dbl>
## 1 1     10     200   50%   6            0.163  0.176     10  0.0558 0.109 
## 2 1     10     200   60%   6            0.176  0.138     10  0.0435 0.0853
## 3 1     10     200   70%   6            0.295  0.221     10  0.0698 0.137 
## 4 1     15     200   50%   6            0.0716 0.0938    15  0.0242 0.0475
## 5 1     15     200   60%   6            0.106  0.135     15  0.0348 0.0683
## 6 1     15     200   70%   6            0.204  0.155     15  0.0401 0.0785
## # ℹ 5 more variables: exp_dz <dbl>, lower <dbl>, upper <dbl>, above_zero <fct>,
## #   power <dbl>

plot

p11.5 <- sx2_diff_power %>%
  ggplot(aes(x = exp, y = exp_mean_diff, ymin = lower, ymax = upper)) +
  geom_pointrange(fatten = 1/2, aes(colour=above_zero)) +
  geom_hline(yintercept = 0, colour = "red") +
  scale_colour_manual(values=c("darkgrey","black")) +
  geom_text(aes(x=700, y=-0.35, label = sprintf("%.1f%s", power, "% power")), 
            color = "darkgrey", size = 5) +
  theme(legend.position = "none") +
  labs(x = "sim # (i.e., simulation index)",
       y = "hazard difference") +
  scale_x_discrete(breaks = c(250,500,750,1000)) +
  facet_grid(fct_rev(subj_n)~b6c)
p11.5

ggsave ("Tutorial_4_planning/figures/sim2/bin6_diffs.jpeg",
        width = 10, height = 8, dpi=800)

plot power as a bar plot

p11.6 <- ggplot(power_x2, aes(x=subj_n, y=power,
                           colour = b6c, fill = b6c)) +
  geom_col(alpha = 0.5) +
  geom_hline(yintercept = 0.8, colour = "red", linetype = "dashed") +
  geom_hline(yintercept = 0.9, colour = "black", linetype = "dashed") +
  geom_hline(yintercept = 0.95, colour = "blue", linetype = "dashed") +
  scale_colour_brewer(palette = "Dark2") +
  scale_fill_brewer(palette = "Dark2") +
  labs(title = "Simulation 2") +
  theme(legend.position = "none") +
  facet_wrap(~b6c, nrow=1)
p11.6

ggsave ("Tutorial_4_planning/figures/sim2/bin6_power_col.jpeg",
        width = 8, height = 6, dpi=800)

A summary of these power calculations might be as follows:

For a 70% reduction (0.3 hazard ratio), N=10, trial=200 per condition, would give nearly 100% power. For a 60% reduction (0.4 hazard ratio), N=10, trial=200 per condition, would give nearly 90% power. For a 50% reduction (0.5 hazard ratio), N=15, trial=200 per condition, would give over 80% power.

And a conclusion / judgment / decision process might be:

Well, like almost always, it depends on your objectives. Some considerations might be…

How much power or precision are you looking to obtain in this particular study? Are you running multiple studies that have some form of replication built in? What resources do you have at your disposal, such as time, money and personnel? How easy or difficult is it to obtain the specific type of sample?

My thoughts for studies in my lab might be something like this…

Pick 0.4 or 0.5 as a target effect size since this is much smaller than that observed in published studies, then pick the corresponding N value (i.e., N=10 or N=15) that takes you over the 80% power mark.

But, and this is an important “but”, do not solely rely on one study. Run a follow-up experiment that replicates and extends the initial result. By doing so, you avoid the Cult of the Isolated Single Study, and it reduces the reliance on any one type of power analysis. Instead, you are aiming for common patterns across two or more experiments, rather than trying to make the case that a single study has sufficient evidential value to hit some criterion mark.

12. Join some plots together

Join the power column plots together from sim1 and sim2.

p12.1 <- (p10.4 | p11.6) +
  plot_annotation(tag_levels = 'A') +
  plot_layout(axes = 'collect')
p12.1

ggsave ("Tutorial_4_planning/figures/bin6_power_col.jpeg",
        width = 12, height = 7, dpi=800)